Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

KU Parking Assistant: A Simple Tool-Using Agent

University of Kansas School of Business

Every agent notebook you have seen so far in this course relies on the LLM to do all the heavy lifting -- reasoning, planning, and generating answers from its own knowledge. But there are tasks where pure LLM reasoning is the wrong tool for the job. Calculating the distance between two GPS coordinates, for example, requires precise trigonometry. An LLM might approximate the answer, but it will sometimes get it wrong in ways that matter (ranking a lot as “closest” when it is actually third-closest). A Python function using the Haversine formula will get it right every time, in microseconds, with no hallucination risk.

This notebook builds a small agent that combines the best of both worlds. The LLM handles what it is good at -- understanding natural language (“Where can I park near the business school?”), fuzzy matching building names, and formatting a helpful response. Python functions handle what they are good at -- looking up coordinates in a dictionary, computing great-circle distances, sorting results, and constructing Google Maps URLs. The LLM decides when to call which tool, but it never does the math itself.

What makes this different from the other agent notebooks:

  • We use real deterministic tools (distance math, data lookups) instead of a pure text-in/text-out workflow

  • The LLM decides when to call which tool, but it does not do the math itself -- it delegates to Python

  • This is a pure workflow, not a chatbot: one input produces one structured output

Teaching goal: Show the difference between fuzzy LLM reasoning (good for language understanding: “business school” maps to “Capitol Federal Hall”) and precise tool execution (good for deterministic tasks: distance math, sorting, data lookup). A well-designed agent lets each side do what it is best at.

Note on data: The real KU parking map is at https://parking.ku.edu/sites/parking/files/documents/parkingmap.pdf. For this demo we use a small illustrative dataset with approximate coordinates so the notebook runs standalone.

This notebook picks up where the agents in Chapters 16 and 17 left off — the single-agent, multi-agent, and autonomous notebooks all reasoned over text alone, while here the LLM delegates the precise work to real Python tools it chooses to call.

!pip install -q git+https://github.com/KarAnalytics/llm_cascade.git

Imports and LLM Setup

We use llm_cascade for automatic LLM provider fallback, and Python’s built-in math module for the distance calculations. No external APIs needed.

import math
import json as _json
import re as _re

from llm_cascade import get_cascade

llm = get_cascade()

Data: KU Buildings and Parking Lots

We hardcode two small datasets that represent the physical world the agent needs to reason about. In a real deployment, these would come from a database, a GIS system, or a parsed PDF. For this classroom demo, approximate coordinates are sufficient to demonstrate the architecture.

KU Buildings are the destinations students might ask about. The dictionary maps human-readable names (with common aliases in parentheses) to latitude/longitude tuples. This naming convention is deliberate: when a student asks about “the business school,” our fuzzy matcher will find “Capitol Federal Hall (Business School)” because the alias appears in the dictionary key.

KU Parking Lots include a color code indicating who can park there. Understanding these colors is essential context for the agent’s final answer -- it is not helpful to recommend a Yellow (faculty-only) lot to a student with a Blue permit. The colors are:

  • Blue -- student commuter permit

  • Yellow -- faculty / staff permit

  • Gold -- premium permit (close to central campus)

  • Red -- restricted / reserved

  • Visitor -- open to visitors, usually metered or short-term

  • Park & Ride -- free remote lot with shuttle service

# --- KU Buildings (approximate coordinates for illustration) ---
KU_BUILDINGS = {
    'Capitol Federal Hall (Business School)':        (38.953505, -95.249740),
    'Kansas Union':                                  (38.959200, -95.243500),
    'Allen Fieldhouse':                              (38.954306, -95.252394),
    'Strong Hall':                                   (38.958542, -95.247614),
    'Watson Library':                                (38.956621, -95.244787),
    'Anschutz Library':                              (38.957323, -95.249742),
    'Dyche Hall':                                    (38.958610, -95.243890),
    'Fraser Hall':                                   (38.957160, -95.243590),
    'Lied Center':                                   (38.954940, -95.262890),
    'Memorial Stadium':                              (38.963330, -95.246390),
    'Wescoe Hall':                                   (38.957430, -95.247830),
    'Spencer Museum of Art':                         (38.959634, -95.244569),
    'Ambler Student Recreation Fitness Center':      (38.952512, -95.247929),
}

# --- Informal names, nicknames, and school/department associations people
# actually type. Any of these should resolve to the canonical building name. ---
BUILDING_ALIASES = {
    'Capitol Federal Hall (Business School)': [
        'business school', 'b-school', 'bschool', 'biz school',
        'school of business', 'ku business school', 'ku business',
        'capitol federal', 'capitol federal hall', 'capfed', 'cap fed',
    ],
    'Kansas Union': [
        'union', 'student union', 'ku union', 'the union', 'ku student union',
        'memorial union', 'bookstore',
    ],
    'Allen Fieldhouse': [
        'allen', 'fieldhouse', 'the phog', 'phog', 'phog allen',
        'basketball stadium', 'phog stadium', 'basketball arena',
        'ku basketball', 'basketball', 'jayhawk basketball',
    ],
    'Strong Hall': [
        'strong', 'admin building', 'administration', 'admin',
        'chancellor', "chancellor's office",
    ],
    'Watson Library': [
        'watson', 'watson lib', 'main library', 'research library',
        'ku library',
    ],
    'Anschutz Library': [
        'anschutz', 'anschutz lib', 'science library', 'stem library',
    ],
    'Dyche Hall': [
        'dyche', 'natural history museum', 'nhm',
        'biodiversity institute', 'museum of natural history',
    ],
    'Fraser Hall': [
        'fraser', 'psychology building', 'psych building',
    ],
    'Lied Center': [
        'lied', 'performing arts', 'performing arts center',
        'concert hall', 'theater',
    ],
    'Memorial Stadium': [
        'stadium', 'football stadium', 'memorial', 'kivisto field',
        'ku football', 'football arena', 'jayhawk football',
    ],
    'Wescoe Hall': [
        'wescoe', 'wescoe beach', 'liberal arts building',
        'college of liberal arts', 'humanities building',
    ],
    'Spencer Museum of Art': [
        'spencer', 'spencer museum', 'art museum', 'museum of art',
    ],
    'Ambler Student Recreation Fitness Center': [
        'ambler', 'ambler gym', 'ambler fitness', 'ambler fitness center',
        'ambler rec', 'ambler rec center', 'ambler recreation',
        'ambler srfc', 'srfc',
        'student gym', 'student rec', 'student rec center',
        'recreation center', 'fitness center', 'rec center',
        'campus gym', 'campus rec', 'ku gym', 'the rec', 'gym',
    ],
}

# --- KU Parking Lots (approximate coordinates for illustration) ---
KU_PARKING_LOTS = [
    {'lot': 'Lot 3'                  , 'color': 'Blue'    , 'lat': 38.958942, 'lng': -95.247614},
    {'lot': 'Lot 5'                  , 'color': 'Yellow'  , 'lat': 38.95861, 'lng': -95.24441},
    {'lot': 'Lot 10'                 , 'color': 'Blue'    , 'lat': 38.956221, 'lng': -95.244787},
    {'lot': 'Lot 14'                 , 'color': 'Yellow'  , 'lat': 38.95716, 'lng': -95.24307},
    {'lot': 'Lot 16'                 , 'color': 'Visitor' , 'lat': 38.9592, 'lng': -95.24298},
    {'lot': 'Lot 18'                 , 'color': 'Yellow'  , 'lat': 38.95703, 'lng': -95.24783},
    {'lot': 'Lot 60'                 , 'color': 'Blue'    , 'lat': 38.96333, 'lng': -95.24691},
    {'lot': 'Lot 70'                 , 'color': 'Blue'    , 'lat': 38.953906, 'lng': -95.252394},
    {'lot': 'Lot 71'                 , 'color': 'Yellow'  , 'lat': 38.953666, 'lng': -95.252394},
    {'lot': 'Lot 90'                 , 'color': 'Gold'    , 'lat': 38.952512, 'lng': -95.247929},
    {'lot': 'Lot 91'                 , 'color': 'Gold'    , 'lat': 38.959634, 'lng': -95.244569},
    {'lot': 'Lot 118'                , 'color': 'Gold'    , 'lat': 38.953505, 'lng': -95.24974},
    {'lot': 'Lot 300A-D'             , 'color': 'Gold'    , 'lat': 38.95494, 'lng': -95.26237},
    {'lot': 'Lot 300E-G'             , 'color': 'Gold'    , 'lat': 38.95494, 'lng': -95.26289},
    {'lot': 'Allen Fieldhouse Garage', 'color': 'Gold'    , 'lat': 38.954306, 'lng': -95.252394},
]

print(f'Loaded {len(KU_BUILDINGS)} buildings, '
      f'{sum(len(v) for v in BUILDING_ALIASES.values())} aliases, '
      f'and {len(KU_PARKING_LOTS)} parking lots.')

The Tools (plain Python functions)

Three simple tools the agent can call. Notice there is no LLM inside these functions -- they are pure deterministic Python. The LLM will decide when to call them based on the user’s question, but every computation inside them produces an exact, repeatable result.

ToolWhat it doesWhy it exists
list_ku_buildings()Returns the list of available buildingsSo the agent can show options when the user’s query is ambiguous
find_parking_near_building(name, top_k)Finds the top-K nearest parking lots within a 2-mile radiusThe core function -- distance math, sorting, and URL generation
get_parking_colors_legend()Explains what each parking color meansSo the agent can annotate its answer with permit-type context

The distance calculation uses the Haversine formula, which computes the great-circle distance between two latitude/longitude points on a sphere. For the short distances on a university campus (typically under 1 mile), the Haversine result is accurate to within a few feet -- far better than asking an LLM to eyeball coordinate differences. The 2-mile radius filter is a practical constraint that keeps the results useful: a parking lot 5 miles away is technically “near” the campus, but no student would walk there.

Approximate building matching

Real users rarely type the full canonical building name. They write “Ambler gym”, “student gym”, “the phog”, “basketball stadium”, or “school of business”. Our _match_building helper handles this by combining three strategies:

  1. Alias table. Each canonical building name has a list of common nicknames and school/department associations (e.g., Allen Fieldhouse gets phog, phog stadium, basketball arena, ku basketball; Ambler SRFC gets ambler gym, student gym, fitness center, gym).

  2. Substring matching in both directions, against the canonical name and every alias.

  3. Token-overlap scoring (F1 over tokens, with a small stopword list) for phrases that don’t substring-match but share enough words with the building or an alias.

The matcher tries exact → substring → token-overlap in that order and requires a token overlap of at least 0.5 before it commits to an approximate match. That threshold prevents random words from latching onto a building. The alias table lives right next to the coordinates, so adding a new nickname is a one-line edit. The Google Maps URLs are constructed by embedding the latitude and longitude into a standard maps?q=LAT,LNG template, which produces a clickable pin link that works on any device.

def _haversine_miles(lat1, lng1, lat2, lng2):
    '''Great-circle distance between two lat/lng points, in miles.'''
    R = 3958.8  # Earth radius in miles
    phi1, phi2 = math.radians(lat1), math.radians(lat2)
    dphi = math.radians(lat2 - lat1)
    dlam = math.radians(lng2 - lng1)
    a = math.sin(dphi / 2) ** 2 + math.cos(phi1) * math.cos(phi2) * math.sin(dlam / 2) ** 2
    return 2 * R * math.asin(math.sqrt(a))


def _google_maps_pin(lat, lng):
    '''Return a clickable Google Maps pin URL for a coordinate.'''
    return f'https://www.google.com/maps?q={lat},{lng}'


# ---- Approximate building matcher ----
# Small stopword list -- filler words that should not drive a token match.
_STOPWORDS = {
    'the', 'a', 'an', 'of', 'at', 'on', 'in', 'and', 'or', 'to',
    'ku', 'kansas', 'university', 'hall', 'building', 'bldg',
}


def _tokenize(s):
    return [t for t in _re.split(r'[^a-z0-9]+', s.lower()) if t and t not in _STOPWORDS]


def _token_score(query, candidate):
    '''F1-style token overlap score between two strings (0 to 1).'''
    q_tokens = set(_tokenize(query))
    c_tokens = set(_tokenize(candidate))
    if not q_tokens or not c_tokens:
        return 0.0
    overlap = len(q_tokens & c_tokens)
    if overlap == 0:
        return 0.0
    precision = overlap / len(q_tokens)
    recall = overlap / len(c_tokens)
    return (2 * precision * recall) / (precision + recall)


def _match_building(query):
    '''Approximate building match. Accepts informal names like "Ambler gym",
    "student gym", "the phog", "business school" and resolves them to the
    canonical building name. Order of preference:
      1. Exact match on canonical name or alias
      2. Substring match (query in name, or query in alias, or alias in query)
      3. Token-overlap scoring across canonical names and aliases (>= 0.5)
    '''
    q = query.lower().strip()
    if not q:
        return None

    # 1a. Exact canonical name
    for name in KU_BUILDINGS:
        if name.lower() == q:
            return name
    # 1b. Exact alias
    for name, aliases in BUILDING_ALIASES.items():
        for alias in aliases:
            if alias.lower() == q:
                return name

    # 2a. Substring against canonical name (either direction)
    for name in KU_BUILDINGS:
        nlow = name.lower()
        if q in nlow or nlow in q:
            return name
    # 2b. Reverse substring on the non-parenthetical short form
    for name in KU_BUILDINGS:
        short_key = name.split('(')[0].strip().lower()
        if short_key and (short_key in q or q in short_key):
            return name
    # 2c. Substring either direction against any alias
    for name, aliases in BUILDING_ALIASES.items():
        for alias in aliases:
            al = alias.lower()
            if al in q or q in al:
                return name

    # 3. Token-overlap scoring across canonical name + all aliases
    best_name, best_score = None, 0.0
    for name in KU_BUILDINGS:
        for cand in [name] + BUILDING_ALIASES.get(name, []):
            score = _token_score(q, cand)
            if score > best_score:
                best_score = score
                best_name = name
    # Require meaningful overlap so random words do not latch onto a building.
    return best_name if best_score >= 0.5 else None


# ---- Tools the agent can call ----

def list_ku_buildings(_=''):
    '''Return the list of KU buildings available for parking lookup.'''
    return chr(10).join(f'- {name}' for name in KU_BUILDINGS.keys())


def find_parking_near_building(building_name, top_k=5, max_distance_miles=2.0):
    '''Find nearest parking lots within max_distance_miles of the building, top-K results.'''
    matched = _match_building(building_name)
    if matched is None:
        return (
            f'No KU building matches "{building_name}". '
            f'Use list_ku_buildings to see available options.'
        )

    target_lat, target_lng = KU_BUILDINGS[matched]

    # Compute distance to every lot and sort
    ranked = []
    for lot in KU_PARKING_LOTS:
        dist = _haversine_miles(target_lat, target_lng, lot['lat'], lot['lng'])
        ranked.append({
            'lot': lot['lot'],
            'color': lot['color'],
            'distance_miles': round(dist, 3),
            'walk_minutes': round(dist * 20),  # ~3 mph walking speed
            'google_maps': _google_maps_pin(lot['lat'], lot['lng']),
        })

    ranked.sort(key=lambda r: r['distance_miles'])

    # Filter to lots within max_distance_miles of the destination
    within_range = [r for r in ranked if r['distance_miles'] <= max_distance_miles]

    if not within_range:
        return (
            f'No parking lots within {max_distance_miles} miles of {matched}. '
            f'Try a different building or increase the radius.'
        )

    header = (
        f'Parking lots within {max_distance_miles} miles of {matched} '
        f'(top {top_k}, ranked by distance):' + chr(10)
    )
    lines = [header]
    for i, r in enumerate(within_range[:top_k], start=1):
        lines.append(
            f"  {i}. {r['lot']} ({r['color']}) - "
            f"{r['distance_miles']} mi ({r['walk_minutes']} min walk) "
            f"-> {r['google_maps']}"
        )
    return chr(10).join(lines)


def get_parking_colors_legend(_=''):
    '''Explain what each parking color means at KU.'''
    return (
        'KU parking color legend:' + chr(10)
        + '- Blue: student commuter permit' + chr(10)
        + '- Yellow: faculty/staff permit' + chr(10)
        + '- Gold: premium permit (close to central campus)' + chr(10)
        + '- Red: restricted/reserved' + chr(10)
        + '- Visitor: open to visitors (often metered or short-term)' + chr(10)
        + '- Park & Ride: free remote lot with shuttle service'
    )


# Register tools for the agent
TOOLS = {
    'list_ku_buildings':            list_ku_buildings,
    'find_parking_near_building':   find_parking_near_building,
    'get_parking_colors_legend':    get_parking_colors_legend,
}

# Quick sanity check that the matcher handles informal names:
for _q in ['ambler gym', 'student gym', 'the phog', 'basketball stadium',
          'school of business', 'the union', 'fitness center']:
    print(f'  {_q!r:35s} -> {_match_building(_q)}')

print()
print('Tools registered:', list(TOOLS.keys()))

Test the Tools (no LLM yet)

Before wiring up the agent, let’s call the tools directly to make sure they work. This is a critical debugging practice: if the raw tool output is wrong, no amount of prompt engineering on the agent side will fix it. By testing the tools in isolation, we can verify that the fuzzy matcher resolves “business school” correctly, that the distance calculations are sensible, and that the Google Maps URLs point to the right locations. Only after we trust the tools should we hand them to an LLM.

# Call one tool directly
print(find_parking_near_building('business school', top_k=5))
print()
print(find_parking_near_building('Kansas Union', top_k=3))

The Agent (LLM that calls the tools)

We use a prompt-based ReAct loop -- no bind_tools, no special API features, works with any LLM. The system prompt describes the three available tools and tells the LLM to output a JSON action like {"tool": "find_parking_near_building", "input": "business school"} when it wants to call one. Our loop parses the JSON, executes the tool, feeds the result back into the conversation, and repeats until the LLM produces a natural-language final answer instead of a tool call.

This is a deliberately simple architecture. Framework-based agents (LangChain, CrewAI) add convenience methods and error handling, but the core pattern is exactly what you see here: prompt the LLM, check if it wants to call a tool, execute the tool if so, feed the result back, repeat. Understanding this loop demystifies what frameworks do under the hood and gives you the confidence to debug agent behavior when something goes wrong.

AGENT_SYSTEM = (
    'You are a helpful KU parking assistant. You have these tools:' + chr(10)
    + '- list_ku_buildings(): returns the list of KU buildings available.' + chr(10)
    + '- find_parking_near_building(name): returns the top 5 nearest parking lots to a building, '
    + 'with color, distance, walk time, and Google Maps pin.' + chr(10)
    + '- get_parking_colors_legend(): explains the KU parking color codes.' + chr(10) + chr(10)
    + 'To use a tool, reply with ONLY this JSON (nothing else):' + chr(10)
    + '{"tool": "tool_name", "input": "the input string"}' + chr(10) + chr(10)
    + 'After receiving a tool result, write a helpful final answer for the user that includes '
    + 'the ranked parking list, color meanings, and clickable Google Maps links.' + chr(10) + chr(10)
    + 'IMPORTANT -- find_parking_near_building does APPROXIMATE matching on the building name, '
    + 'so pass the user\'s phrase through as-is instead of trying to guess the canonical name. '
    + 'Examples that all resolve correctly:' + chr(10)
    + '  - "business school" or "school of business" -> Capitol Federal Hall (Business School)' + chr(10)
    + '  - "ambler", "ambler fitness center", "ambler gym", "student gym", "fitness center", '
    + '"the rec", "ku gym" -> Ambler Student Recreation Fitness Center' + chr(10)
    + '  - "the union", "student union", "bookstore" -> Kansas Union' + chr(10)
    + '  - "the phog", "basketball stadium", "phog stadium", "ku basketball" -> Allen Fieldhouse' + chr(10)
    + '  - "football stadium", "ku football" -> Memorial Stadium' + chr(10)
    + '  - "natural history museum" -> Dyche Hall' + chr(10)
    + '  - "art museum" -> Spencer Museum of Art' + chr(10)
    + 'Only call list_ku_buildings if the tool explicitly reports that no building matches.'
)


def run_parking_agent(question, verbose=True):
    '''Simple ReAct-style loop: LLM decides which tool to call, we execute it, feed result back.'''
    messages = [
        {'role': 'system', 'content': AGENT_SYSTEM},
        {'role': 'user', 'content': question},
    ]

    # Flatten into a single prompt with conversation history
    def flatten(msgs):
        parts = []
        for m in msgs:
            parts.append(f"{m['role'].upper()}: {m['content']}")
        return chr(10).join(parts)

    for step in range(5):  # max 5 tool rounds
        prompt = flatten(messages[1:])  # send everything except system (passed separately)
        response = llm.generate(prompt, system_prompt=AGENT_SYSTEM)
        reply = response.text.strip()

        # Try to parse a tool call
        tool_call = None
        m = _re.search(r'\{[^{}]+\}', reply)
        if m:
            try:
                parsed = _json.loads(m.group())
                if 'tool' in parsed and 'input' in parsed:
                    tool_call = parsed
            except _json.JSONDecodeError:
                pass

        if tool_call is None:
            # No tool call -- this is the final answer
            if verbose:
                print(f'  [Step {step+1}] Final answer' + chr(10))
            return reply

        # Execute the requested tool
        tool_name = tool_call['tool']
        tool_input = tool_call['input']
        tool_fn = TOOLS.get(tool_name)
        if tool_fn:
            if verbose:
                print(f'  [Step {step+1}] Calling: {tool_name}({tool_input})')
            result = tool_fn(tool_input)
            if verbose:
                preview = str(result).replace(chr(10), ' | ')[:150]
                print(f'              Result: {preview}...')
        else:
            result = f'Unknown tool: {tool_name}'

        messages.append({'role': 'assistant', 'content': reply})
        messages.append({'role': 'user', 'content': f'Tool result: {result}'})

    return 'Agent exceeded maximum steps.'


print('Parking agent ready.')

Ask the Agent Some Questions

Now we let the agent run end-to-end. For each question, the agent should: (1) figure out which tool to call based on the user’s natural language, (2) call find_parking_near_building with the user’s phrase (the fuzzy matcher handles informal names), and (3) format a helpful response with the ranked list, permit colors, and clickable Google Maps links. Try uncommenting the other questions to see how the agent handles visitor-specific queries, requests for the color legend, and different building names.

from IPython.display import display, Markdown

questions = [
    'Where can I park near the business school?',
    # Approximate/informal names all resolve to the right canonical building:
    # 'Where can I park near the Ambler fitness center?',
    # 'I want to go to the student gym, where should I park?',
    # 'parking near Ambler gym',
    # 'I need parking near the student union. What are my options?',
    # 'parking close to the phog',
    # 'basketball stadium parking',
    # 'football stadium parking',
    # 'What colors of parking are available at KU and what do they mean?',
    # 'I am visiting KU and want to park near Allen Fieldhouse. Where should I go?',
]

for q in questions:
    display(Markdown('---'))
    display(Markdown(f'**Q:** {q}'))
    answer = run_parking_agent(q)
    # Render as Markdown so Google Maps URLs become clickable links.
    # Colab auto-linkifies any URL inside a Markdown-rendered cell.
    display(Markdown(answer))

How to Build the Same Thing in Dify (multi-node workflow with real tools)

A proper Dify workflow uses alternating tool and LLM nodes, matching the architecture you see in typical agentic workflow diagrams. Each tool node fetches data, each LLM node reasons about what to do next with that data.

The workflow architecture

Start (user types a KU building name)
  |
  v
[TOOL #1] Knowledge Retrieval - KU_Buildings KB
  | (acts as the "access Google Maps" step: returns the building's lat/lng)
  v
[LLM #1] Parse the building coordinates
  | (extracts a clean {lat, lng} from the retrieved text)
  v
[TOOL #2] Knowledge Retrieval - KU_Parking_Lots KB
  | (acts as the "access KU map" step: returns all parking lot data)
  v
[LLM #2] Compute distances, filter to 2 miles, rank top 5, format output
  | (does the math approximately using the coordinates)
  v
End (returns formatted parking list with Google Maps pin URLs)

Five processing nodes: 2 Knowledge Retrieval (tool) nodes + 2 LLM nodes + Start/End. No Code nodes, no custom APIs, no Python.

Why Knowledge Retrieval instead of HTTP Request?

In Dify’s workflow builder, a “tool” is any node that fetches external data. Knowledge Retrieval nodes are the easiest kind of tool because:

  • You upload the data once (buildings + parking lots as markdown files)

  • Dify chunks, embeds, and indexes it automatically

  • Retrieval is just a vector search — fast and free

  • No API keys, no rate limits, no billing

An alternative would be an HTTP Request node calling Nominatim (OpenStreetMap’s free geocoding API), but Nominatim is rate-limited and doesn’t know anything about specific KU building nicknames like “business school” or “the phog”. Our pre-uploaded Knowledge Base is more reliable for a classroom demo.

Approximate building matching in Dify

The Colab agent above uses an explicit alias table plus token-overlap scoring. Dify’s Knowledge Retrieval node does not have that — instead, approximate matching comes from vector embeddings: the user’s phrase is embedded and compared against embedded chunks from ku_buildings.md. For informal queries like “Ambler gym” or “the phog” to surface the correct chunk, the aliases must appear in the markdown itself.

That is why Agentic/dify_data/ku_buildings.md embeds each building’s common aliases directly in its Description: line (nicknames, school/department affiliations, department functions). For example, the Allen Fieldhouse entry mentions “basketball stadium”, “the Phog”, and “Phog Allen” explicitly so a vector search against “phog stadium” still retrieves that chunk. If you add a new nickname that users type, edit the markdown, re-upload (or sync) the Knowledge Base, and re-run the workflow.

If you ever find vector retrieval missing a phrase that the Colab agent handles correctly, either:

  • Add that phrase to the building’s Description line in ku_buildings.md, or

  • Switch the Dify workflow to call the Supabase MCP endpoint from the sibling ku_parking_mcp.ipynb notebook — that endpoint uses the same explicit alias table as the Colab code cell above.

Step-by-step setup

Part A: upload the two data files as Knowledge Bases

The notebook directory includes two markdown files at Agentic/dify_data/:

  • ku_buildings.md - one section per building with coordinates and alias-rich descriptions

  • ku_parking_lots.md - one section per parking lot with color and coordinates

In Dify:

  1. Go to cloud.dify.ai -> Knowledge -> Create Knowledge

  2. Upload ku_buildings.md. Name the KB KU_Buildings.

  3. On the chunking screen, pick Custom chunking:

    • Chunk delimiter: \n# (hash + space — this splits on every heading)

    • Max chunk length: 500 tokens

    • This ensures each building becomes its own retrievable chunk, and the alias-rich description stays within that chunk.

  4. Indexing: High Quality (vector embeddings).

  5. Click Save & Process. Wait ~30 seconds.

  6. Repeat: create a second Knowledge Base called KU_Parking_Lots and upload ku_parking_lots.md with the same chunking settings.

Part B: build the workflow

  1. Studio -> Create App -> Workflow (not Chatflow).

  2. Name it KU Parking Assistant.

  3. On the canvas, click the Start node:

    • Add input variable: name = destination, type = Text, label = “Which KU building are you visiting? (informal names like ‘the phog’, ‘Ambler gym’, ‘business school’ all work)”

  4. Add TOOL #1: Knowledge Retrieval (Buildings)

    • Click + after Start -> Knowledge Retrieval

    • Query variable: Start.destination

    • Knowledge: select KU_Buildings

    • Rename this node to Retrieve Building

  5. Add LLM #1: Parse Building Coordinates

    • Click + after Retrieve Building -> LLM

    • Model: your preferred (gpt-4o-mini, gemini-2.5-flash, etc.)

    • System prompt:

      You are a data extractor. You will receive:
      1. A user query that may use a nickname or informal name
         (e.g., "the phog", "Ambler gym", "student union", "business school").
      2. One or more retrieved KU building chunks, each containing a canonical
         building name, its coordinates, and a description that lists common
         aliases and school/department associations.
      
      Pick the single chunk whose canonical name or description best matches
      the user's query, and extract its canonical name and coordinates. If the
      query uses a nickname, use the alias list in the description to choose
      the correct building (e.g., "the phog" -> Allen Fieldhouse;
      "Ambler gym" or "student gym" -> Ambler Student Recreation Fitness Center;
      "school of business" -> Capitol Federal Hall).
      
      Reply with ONLY this JSON format (nothing else):
      {"building": "full canonical name", "lat": 38.9573, "lng": -95.2519}
    • User prompt:

      Retrieved building information:
      {{#Retrieve_Building.result#}}
      
      User query: {{#Start.destination#}}
    • Rename this node to Parse Coords

  6. Add TOOL #2: Knowledge Retrieval (Parking Lots)

    • Click + after Parse Coords -> Knowledge Retrieval

    • Query variable: Start.destination (or a fixed query like “parking lots”)

    • Knowledge: select KU_Parking_Lots

    • In the node settings, set Top K to 15 (to retrieve all lots, since our KB is small)

    • Rename this node to Retrieve Parking

  7. Add LLM #2: Distance + Filter + Format

    • Click + after Retrieve Parking -> LLM

    • System prompt:

      You are a parking assistant. You will receive:
      1. A destination building with its latitude and longitude (as JSON).
      2. A list of KU parking lots with their coordinates and colors.
      
      Your job:
      - Compute the approximate great-circle distance in miles from the destination
        to each parking lot. For Lawrence KS, 0.001 degrees of latitude is about
        0.069 miles and 0.001 of longitude is about 0.054 miles.
      - Filter to lots within 2 miles of the destination.
      - Rank the remaining lots from nearest to farthest.
      - Return the top 5 as a numbered list with these fields for each:
        * Rank, lot name, permit color
        * Distance in miles (2 decimals) and approximate walk time (3 mph walking speed)
        * Google Maps pin URL in this exact format: https://www.google.com/maps?q=LAT,LNG
      - End with a short note explaining what the permit color of the closest lot means.
      - If no lots are within 2 miles, say so and suggest the Park & Ride shuttle.
    • User prompt:

      Destination: {{#Parse_Coords.text#}}
      
      Available parking lots:
      {{#Retrieve_Parking.result#}}
      
      Give me the top 5 parking lots within 2 miles.
    • Rename this node to Rank and Format

  8. Configure the End node

    • Output variable: Rank_and_Format.text

  9. Publish and click Run to test the approximate matcher end-to-end:

    • business school / school of business -> Capitol Federal Hall

    • the phog / basketball stadium / phog stadium -> Allen Fieldhouse

    • Ambler gym / student gym / fitness center -> Ambler SRFC

    • the union / student union -> Kansas Union

    • football stadium / ku football -> Memorial Stadium

What each node does and why

NodeTypeJobDeterministic or LLM?
StartInputCollect user destination (any informal phrase)-
Retrieve BuildingTool (Knowledge Retrieval)Vector search against buildings KB — finds the right chunk from an alias matchDeterministic (vector search)
Parse CoordsLLMPicks the right canonical building from the retrieved chunks and extracts {lat, lng}LLM (disambiguation + extraction)
Retrieve ParkingTool (Knowledge Retrieval)Look up parking lots from the Parking KBDeterministic (vector search)
Rank and FormatLLMCompute distances, filter 2 miles, rank, formatLLM (approximate math + formatting)
EndOutputReturn the formatted result-

Notice the alternation: Tool -> LLM -> Tool -> LLM. Each tool fetches fresh data, each LLM reasons about what to do with it. This is the standard pattern in Dify workflows and maps directly to the architecture diagram you described.

Colab (real tools) vs. Dify (workflow tools) - the teaching contrast

Colab notebookDify workflow
How are tools defined?Python functionsDify nodes (Knowledge Retrieval, LLM, HTTP Request)
Who decides which tool to call?The LLM, at each step of the agent loopFixed by the workflow diagram (same tools in the same order)
Approximate building matchingExplicit alias table + token-overlap scoringVector embeddings over alias-rich descriptions + LLM disambiguation
Distance mathExact (Haversine formula)Approximate (LLM reasoning over coordinates)
Data sourcePython dictsDify Knowledge Bases with vector retrieval
FlexibilityHigh — agent can call tools in any orderLow — flow is fixed, but predictable
Best forComplex questions with branching reasoningSimple, predictable pipelines with the same steps every time

When to pick each approach

  • Colab agent (this notebook): when the agent needs to decide dynamically whether to call list_buildings, find_parking_near_building, or get_parking_colors_legend based on the question. Good for conversational or exploratory use cases.

  • Dify workflow (above): when every query follows the same steps (lookup destination -> lookup parking -> rank). Good for a production app where you want consistent, debuggable behavior without per-query reasoning overhead.

Limitations of the Dify version

  • The LLM’s distance math is approximate. For 15 lots in a small area it usually gets the top 3 right, but lots at very similar distances may be mis-ranked by 1-2 positions.

  • Approximate building matching depends on the aliases being present in the markdown. A nickname that isn’t written into ku_buildings.md will only match if it’s semantically close enough for the embedding model to score highly — and that is less reliable than the explicit alias table in the Colab agent.

  • If your Knowledge Base grows beyond a few hundred entries, the vector retrieval step may miss relevant items. At that scale you’d switch to HTTP Request calls to a real database, or to the MCP endpoint in ku_parking_mcp.ipynb.

  • The 2-mile filter is enforced in the LLM prompt, not in code. A rogue LLM response could include a lot outside 2 miles. In production you’d add a Template Transform or Parameter Extractor node to enforce the filter programmatically.

Exercises

  • Add a new building (e.g., 'Budig Hall': (lat, lng)) and a new parking lot, then rerun — notice you don’t need to change any tool code.

  • Add a fourth tool: find_parking_by_color(color, near_building) that filters to only lots of a specific permit color.

  • Replace the hardcoded data with a real parse of the KU parking map PDF — this is where real data engineering begins.

  • In Dify, try removing the coordinates from the system prompt and just listing lot names. How does answer quality change? This shows why giving the LLM structured numeric data matters.

  • Compare the Colab and Dify outputs for the same query. How often do they agree on the top-3 ranking?

Key takeaways

  • Tool-using agents combine LLM strengths (fuzzy language understanding) with Python strengths (deterministic math and lookups), letting each side do what it does best.

  • The Haversine formula gives exact great-circle distances in microseconds, which is far more reliable than asking the LLM to eyeball coordinate differences.

  • Alias tables plus token-overlap scoring resolve informal phrases like “the phog” or “student gym” to canonical building names before any tool runs.

  • A prompt-based ReAct loop -- emit JSON action, run the tool, feed the result back -- is all you need to build a working agent without a framework.

  • Dify workflows are the fixed-flow counterpart: alternating Knowledge Retrieval and LLM nodes give predictable behavior, while the Python agent offers dynamic tool selection.


Run the code

To run this notebook, copy the URL below into your browser’s address bar. The link opens the notebook directly in Google Colab. (If your PDF viewer makes the URL clickable and lands on a broken page, copy the full text manually -- the viewer may have truncated the link at a line break.)

Estimated run time: ~2 minutes (1-3 LLM calls per question)

https://colab.research.google.com/github/KarAnalytics/code_demos/blob/main/KU_Parking_Assistant.ipynb