Every agent notebook you have seen so far in this course relies on the LLM to do all the heavy lifting -- reasoning, planning, and generating answers from its own knowledge. But there are tasks where pure LLM reasoning is the wrong tool for the job. Calculating the distance between two GPS coordinates, for example, requires precise trigonometry. An LLM might approximate the answer, but it will sometimes get it wrong in ways that matter (ranking a lot as “closest” when it is actually third-closest). A Python function using the Haversine formula will get it right every time, in microseconds, with no hallucination risk.
This notebook builds a small agent that combines the best of both worlds. The LLM handles what it is good at -- understanding natural language (“Where can I park near the business school?”), fuzzy matching building names, and formatting a helpful response. Python functions handle what they are good at -- looking up coordinates in a dictionary, computing great-circle distances, sorting results, and constructing Google Maps URLs. The LLM decides when to call which tool, but it never does the math itself.
What makes this different from the other agent notebooks:
We use real deterministic tools (distance math, data lookups) instead of a pure text-in/text-out workflow
The LLM decides when to call which tool, but it does not do the math itself -- it delegates to Python
This is a pure workflow, not a chatbot: one input produces one structured output
Teaching goal: Show the difference between fuzzy LLM reasoning (good for language understanding: “business school” maps to “Capitol Federal Hall”) and precise tool execution (good for deterministic tasks: distance math, sorting, data lookup). A well-designed agent lets each side do what it is best at.
Note on data: The real KU parking map is at https://
This notebook picks up where the agents in Chapters 16 and 17 left off — the single-agent, multi-agent, and autonomous notebooks all reasoned over text alone, while here the LLM delegates the precise work to real Python tools it chooses to call.
!pip install -q git+https://github.com/KarAnalytics/llm_cascade.git
Imports and LLM Setup¶
We use llm_cascade for automatic LLM provider fallback, and Python’s built-in math module for the distance calculations. No external APIs needed.
import math
import json as _json
import re as _re
from llm_cascade import get_cascade
llm = get_cascade()
Data: KU Buildings and Parking Lots¶
We hardcode two small datasets that represent the physical world the agent needs to reason about. In a real deployment, these would come from a database, a GIS system, or a parsed PDF. For this classroom demo, approximate coordinates are sufficient to demonstrate the architecture.
KU Buildings are the destinations students might ask about. The dictionary maps human-readable names (with common aliases in parentheses) to latitude/longitude tuples. This naming convention is deliberate: when a student asks about “the business school,” our fuzzy matcher will find “Capitol Federal Hall (Business School)” because the alias appears in the dictionary key.
KU Parking Lots include a color code indicating who can park there. Understanding these colors is essential context for the agent’s final answer -- it is not helpful to recommend a Yellow (faculty-only) lot to a student with a Blue permit. The colors are:
Blue -- student commuter permit
Yellow -- faculty / staff permit
Gold -- premium permit (close to central campus)
Red -- restricted / reserved
Visitor -- open to visitors, usually metered or short-term
Park & Ride -- free remote lot with shuttle service
# --- KU Buildings (approximate coordinates for illustration) ---
KU_BUILDINGS = {
'Capitol Federal Hall (Business School)': (38.953505, -95.249740),
'Kansas Union': (38.959200, -95.243500),
'Allen Fieldhouse': (38.954306, -95.252394),
'Strong Hall': (38.958542, -95.247614),
'Watson Library': (38.956621, -95.244787),
'Anschutz Library': (38.957323, -95.249742),
'Dyche Hall': (38.958610, -95.243890),
'Fraser Hall': (38.957160, -95.243590),
'Lied Center': (38.954940, -95.262890),
'Memorial Stadium': (38.963330, -95.246390),
'Wescoe Hall': (38.957430, -95.247830),
'Spencer Museum of Art': (38.959634, -95.244569),
'Ambler Student Recreation Fitness Center': (38.952512, -95.247929),
}
# --- Informal names, nicknames, and school/department associations people
# actually type. Any of these should resolve to the canonical building name. ---
BUILDING_ALIASES = {
'Capitol Federal Hall (Business School)': [
'business school', 'b-school', 'bschool', 'biz school',
'school of business', 'ku business school', 'ku business',
'capitol federal', 'capitol federal hall', 'capfed', 'cap fed',
],
'Kansas Union': [
'union', 'student union', 'ku union', 'the union', 'ku student union',
'memorial union', 'bookstore',
],
'Allen Fieldhouse': [
'allen', 'fieldhouse', 'the phog', 'phog', 'phog allen',
'basketball stadium', 'phog stadium', 'basketball arena',
'ku basketball', 'basketball', 'jayhawk basketball',
],
'Strong Hall': [
'strong', 'admin building', 'administration', 'admin',
'chancellor', "chancellor's office",
],
'Watson Library': [
'watson', 'watson lib', 'main library', 'research library',
'ku library',
],
'Anschutz Library': [
'anschutz', 'anschutz lib', 'science library', 'stem library',
],
'Dyche Hall': [
'dyche', 'natural history museum', 'nhm',
'biodiversity institute', 'museum of natural history',
],
'Fraser Hall': [
'fraser', 'psychology building', 'psych building',
],
'Lied Center': [
'lied', 'performing arts', 'performing arts center',
'concert hall', 'theater',
],
'Memorial Stadium': [
'stadium', 'football stadium', 'memorial', 'kivisto field',
'ku football', 'football arena', 'jayhawk football',
],
'Wescoe Hall': [
'wescoe', 'wescoe beach', 'liberal arts building',
'college of liberal arts', 'humanities building',
],
'Spencer Museum of Art': [
'spencer', 'spencer museum', 'art museum', 'museum of art',
],
'Ambler Student Recreation Fitness Center': [
'ambler', 'ambler gym', 'ambler fitness', 'ambler fitness center',
'ambler rec', 'ambler rec center', 'ambler recreation',
'ambler srfc', 'srfc',
'student gym', 'student rec', 'student rec center',
'recreation center', 'fitness center', 'rec center',
'campus gym', 'campus rec', 'ku gym', 'the rec', 'gym',
],
}
# --- KU Parking Lots (approximate coordinates for illustration) ---
KU_PARKING_LOTS = [
{'lot': 'Lot 3' , 'color': 'Blue' , 'lat': 38.958942, 'lng': -95.247614},
{'lot': 'Lot 5' , 'color': 'Yellow' , 'lat': 38.95861, 'lng': -95.24441},
{'lot': 'Lot 10' , 'color': 'Blue' , 'lat': 38.956221, 'lng': -95.244787},
{'lot': 'Lot 14' , 'color': 'Yellow' , 'lat': 38.95716, 'lng': -95.24307},
{'lot': 'Lot 16' , 'color': 'Visitor' , 'lat': 38.9592, 'lng': -95.24298},
{'lot': 'Lot 18' , 'color': 'Yellow' , 'lat': 38.95703, 'lng': -95.24783},
{'lot': 'Lot 60' , 'color': 'Blue' , 'lat': 38.96333, 'lng': -95.24691},
{'lot': 'Lot 70' , 'color': 'Blue' , 'lat': 38.953906, 'lng': -95.252394},
{'lot': 'Lot 71' , 'color': 'Yellow' , 'lat': 38.953666, 'lng': -95.252394},
{'lot': 'Lot 90' , 'color': 'Gold' , 'lat': 38.952512, 'lng': -95.247929},
{'lot': 'Lot 91' , 'color': 'Gold' , 'lat': 38.959634, 'lng': -95.244569},
{'lot': 'Lot 118' , 'color': 'Gold' , 'lat': 38.953505, 'lng': -95.24974},
{'lot': 'Lot 300A-D' , 'color': 'Gold' , 'lat': 38.95494, 'lng': -95.26237},
{'lot': 'Lot 300E-G' , 'color': 'Gold' , 'lat': 38.95494, 'lng': -95.26289},
{'lot': 'Allen Fieldhouse Garage', 'color': 'Gold' , 'lat': 38.954306, 'lng': -95.252394},
]
print(f'Loaded {len(KU_BUILDINGS)} buildings, '
f'{sum(len(v) for v in BUILDING_ALIASES.values())} aliases, '
f'and {len(KU_PARKING_LOTS)} parking lots.')The Tools (plain Python functions)¶
Three simple tools the agent can call. Notice there is no LLM inside these functions -- they are pure deterministic Python. The LLM will decide when to call them based on the user’s question, but every computation inside them produces an exact, repeatable result.
| Tool | What it does | Why it exists |
|---|---|---|
list_ku_buildings() | Returns the list of available buildings | So the agent can show options when the user’s query is ambiguous |
find_parking_near_building(name, top_k) | Finds the top-K nearest parking lots within a 2-mile radius | The core function -- distance math, sorting, and URL generation |
get_parking_colors_legend() | Explains what each parking color means | So the agent can annotate its answer with permit-type context |
The distance calculation uses the Haversine formula, which computes the great-circle distance between two latitude/longitude points on a sphere. For the short distances on a university campus (typically under 1 mile), the Haversine result is accurate to within a few feet -- far better than asking an LLM to eyeball coordinate differences. The 2-mile radius filter is a practical constraint that keeps the results useful: a parking lot 5 miles away is technically “near” the campus, but no student would walk there.
Approximate building matching¶
Real users rarely type the full canonical building name. They write “Ambler gym”, “student gym”, “the phog”, “basketball stadium”, or “school of business”. Our _match_building helper handles this by combining three strategies:
Alias table. Each canonical building name has a list of common nicknames and school/department associations (e.g., Allen Fieldhouse gets
phog,phog stadium,basketball arena,ku basketball; Ambler SRFC getsambler gym,student gym,fitness center,gym).Substring matching in both directions, against the canonical name and every alias.
Token-overlap scoring (F1 over tokens, with a small stopword list) for phrases that don’t substring-match but share enough words with the building or an alias.
The matcher tries exact → substring → token-overlap in that order and requires a token overlap of at least 0.5 before it commits to an approximate match. That threshold prevents random words from latching onto a building. The alias table lives right next to the coordinates, so adding a new nickname is a one-line edit. The Google Maps URLs are constructed by embedding the latitude and longitude into a standard maps?q=LAT,LNG template, which produces a clickable pin link that works on any device.
def _haversine_miles(lat1, lng1, lat2, lng2):
'''Great-circle distance between two lat/lng points, in miles.'''
R = 3958.8 # Earth radius in miles
phi1, phi2 = math.radians(lat1), math.radians(lat2)
dphi = math.radians(lat2 - lat1)
dlam = math.radians(lng2 - lng1)
a = math.sin(dphi / 2) ** 2 + math.cos(phi1) * math.cos(phi2) * math.sin(dlam / 2) ** 2
return 2 * R * math.asin(math.sqrt(a))
def _google_maps_pin(lat, lng):
'''Return a clickable Google Maps pin URL for a coordinate.'''
return f'https://www.google.com/maps?q={lat},{lng}'
# ---- Approximate building matcher ----
# Small stopword list -- filler words that should not drive a token match.
_STOPWORDS = {
'the', 'a', 'an', 'of', 'at', 'on', 'in', 'and', 'or', 'to',
'ku', 'kansas', 'university', 'hall', 'building', 'bldg',
}
def _tokenize(s):
return [t for t in _re.split(r'[^a-z0-9]+', s.lower()) if t and t not in _STOPWORDS]
def _token_score(query, candidate):
'''F1-style token overlap score between two strings (0 to 1).'''
q_tokens = set(_tokenize(query))
c_tokens = set(_tokenize(candidate))
if not q_tokens or not c_tokens:
return 0.0
overlap = len(q_tokens & c_tokens)
if overlap == 0:
return 0.0
precision = overlap / len(q_tokens)
recall = overlap / len(c_tokens)
return (2 * precision * recall) / (precision + recall)
def _match_building(query):
'''Approximate building match. Accepts informal names like "Ambler gym",
"student gym", "the phog", "business school" and resolves them to the
canonical building name. Order of preference:
1. Exact match on canonical name or alias
2. Substring match (query in name, or query in alias, or alias in query)
3. Token-overlap scoring across canonical names and aliases (>= 0.5)
'''
q = query.lower().strip()
if not q:
return None
# 1a. Exact canonical name
for name in KU_BUILDINGS:
if name.lower() == q:
return name
# 1b. Exact alias
for name, aliases in BUILDING_ALIASES.items():
for alias in aliases:
if alias.lower() == q:
return name
# 2a. Substring against canonical name (either direction)
for name in KU_BUILDINGS:
nlow = name.lower()
if q in nlow or nlow in q:
return name
# 2b. Reverse substring on the non-parenthetical short form
for name in KU_BUILDINGS:
short_key = name.split('(')[0].strip().lower()
if short_key and (short_key in q or q in short_key):
return name
# 2c. Substring either direction against any alias
for name, aliases in BUILDING_ALIASES.items():
for alias in aliases:
al = alias.lower()
if al in q or q in al:
return name
# 3. Token-overlap scoring across canonical name + all aliases
best_name, best_score = None, 0.0
for name in KU_BUILDINGS:
for cand in [name] + BUILDING_ALIASES.get(name, []):
score = _token_score(q, cand)
if score > best_score:
best_score = score
best_name = name
# Require meaningful overlap so random words do not latch onto a building.
return best_name if best_score >= 0.5 else None
# ---- Tools the agent can call ----
def list_ku_buildings(_=''):
'''Return the list of KU buildings available for parking lookup.'''
return chr(10).join(f'- {name}' for name in KU_BUILDINGS.keys())
def find_parking_near_building(building_name, top_k=5, max_distance_miles=2.0):
'''Find nearest parking lots within max_distance_miles of the building, top-K results.'''
matched = _match_building(building_name)
if matched is None:
return (
f'No KU building matches "{building_name}". '
f'Use list_ku_buildings to see available options.'
)
target_lat, target_lng = KU_BUILDINGS[matched]
# Compute distance to every lot and sort
ranked = []
for lot in KU_PARKING_LOTS:
dist = _haversine_miles(target_lat, target_lng, lot['lat'], lot['lng'])
ranked.append({
'lot': lot['lot'],
'color': lot['color'],
'distance_miles': round(dist, 3),
'walk_minutes': round(dist * 20), # ~3 mph walking speed
'google_maps': _google_maps_pin(lot['lat'], lot['lng']),
})
ranked.sort(key=lambda r: r['distance_miles'])
# Filter to lots within max_distance_miles of the destination
within_range = [r for r in ranked if r['distance_miles'] <= max_distance_miles]
if not within_range:
return (
f'No parking lots within {max_distance_miles} miles of {matched}. '
f'Try a different building or increase the radius.'
)
header = (
f'Parking lots within {max_distance_miles} miles of {matched} '
f'(top {top_k}, ranked by distance):' + chr(10)
)
lines = [header]
for i, r in enumerate(within_range[:top_k], start=1):
lines.append(
f" {i}. {r['lot']} ({r['color']}) - "
f"{r['distance_miles']} mi ({r['walk_minutes']} min walk) "
f"-> {r['google_maps']}"
)
return chr(10).join(lines)
def get_parking_colors_legend(_=''):
'''Explain what each parking color means at KU.'''
return (
'KU parking color legend:' + chr(10)
+ '- Blue: student commuter permit' + chr(10)
+ '- Yellow: faculty/staff permit' + chr(10)
+ '- Gold: premium permit (close to central campus)' + chr(10)
+ '- Red: restricted/reserved' + chr(10)
+ '- Visitor: open to visitors (often metered or short-term)' + chr(10)
+ '- Park & Ride: free remote lot with shuttle service'
)
# Register tools for the agent
TOOLS = {
'list_ku_buildings': list_ku_buildings,
'find_parking_near_building': find_parking_near_building,
'get_parking_colors_legend': get_parking_colors_legend,
}
# Quick sanity check that the matcher handles informal names:
for _q in ['ambler gym', 'student gym', 'the phog', 'basketball stadium',
'school of business', 'the union', 'fitness center']:
print(f' {_q!r:35s} -> {_match_building(_q)}')
print()
print('Tools registered:', list(TOOLS.keys()))Test the Tools (no LLM yet)¶
Before wiring up the agent, let’s call the tools directly to make sure they work. This is a critical debugging practice: if the raw tool output is wrong, no amount of prompt engineering on the agent side will fix it. By testing the tools in isolation, we can verify that the fuzzy matcher resolves “business school” correctly, that the distance calculations are sensible, and that the Google Maps URLs point to the right locations. Only after we trust the tools should we hand them to an LLM.
# Call one tool directly
print(find_parking_near_building('business school', top_k=5))
print()
print(find_parking_near_building('Kansas Union', top_k=3))
The Agent (LLM that calls the tools)¶
We use a prompt-based ReAct loop -- no bind_tools, no special API features, works with any LLM. The system prompt describes the three available tools and tells the LLM to output a JSON action like {"tool": "find_parking_near_building", "input": "business school"} when it wants to call one. Our loop parses the JSON, executes the tool, feeds the result back into the conversation, and repeats until the LLM produces a natural-language final answer instead of a tool call.
This is a deliberately simple architecture. Framework-based agents (LangChain, CrewAI) add convenience methods and error handling, but the core pattern is exactly what you see here: prompt the LLM, check if it wants to call a tool, execute the tool if so, feed the result back, repeat. Understanding this loop demystifies what frameworks do under the hood and gives you the confidence to debug agent behavior when something goes wrong.
AGENT_SYSTEM = (
'You are a helpful KU parking assistant. You have these tools:' + chr(10)
+ '- list_ku_buildings(): returns the list of KU buildings available.' + chr(10)
+ '- find_parking_near_building(name): returns the top 5 nearest parking lots to a building, '
+ 'with color, distance, walk time, and Google Maps pin.' + chr(10)
+ '- get_parking_colors_legend(): explains the KU parking color codes.' + chr(10) + chr(10)
+ 'To use a tool, reply with ONLY this JSON (nothing else):' + chr(10)
+ '{"tool": "tool_name", "input": "the input string"}' + chr(10) + chr(10)
+ 'After receiving a tool result, write a helpful final answer for the user that includes '
+ 'the ranked parking list, color meanings, and clickable Google Maps links.' + chr(10) + chr(10)
+ 'IMPORTANT -- find_parking_near_building does APPROXIMATE matching on the building name, '
+ 'so pass the user\'s phrase through as-is instead of trying to guess the canonical name. '
+ 'Examples that all resolve correctly:' + chr(10)
+ ' - "business school" or "school of business" -> Capitol Federal Hall (Business School)' + chr(10)
+ ' - "ambler", "ambler fitness center", "ambler gym", "student gym", "fitness center", '
+ '"the rec", "ku gym" -> Ambler Student Recreation Fitness Center' + chr(10)
+ ' - "the union", "student union", "bookstore" -> Kansas Union' + chr(10)
+ ' - "the phog", "basketball stadium", "phog stadium", "ku basketball" -> Allen Fieldhouse' + chr(10)
+ ' - "football stadium", "ku football" -> Memorial Stadium' + chr(10)
+ ' - "natural history museum" -> Dyche Hall' + chr(10)
+ ' - "art museum" -> Spencer Museum of Art' + chr(10)
+ 'Only call list_ku_buildings if the tool explicitly reports that no building matches.'
)
def run_parking_agent(question, verbose=True):
'''Simple ReAct-style loop: LLM decides which tool to call, we execute it, feed result back.'''
messages = [
{'role': 'system', 'content': AGENT_SYSTEM},
{'role': 'user', 'content': question},
]
# Flatten into a single prompt with conversation history
def flatten(msgs):
parts = []
for m in msgs:
parts.append(f"{m['role'].upper()}: {m['content']}")
return chr(10).join(parts)
for step in range(5): # max 5 tool rounds
prompt = flatten(messages[1:]) # send everything except system (passed separately)
response = llm.generate(prompt, system_prompt=AGENT_SYSTEM)
reply = response.text.strip()
# Try to parse a tool call
tool_call = None
m = _re.search(r'\{[^{}]+\}', reply)
if m:
try:
parsed = _json.loads(m.group())
if 'tool' in parsed and 'input' in parsed:
tool_call = parsed
except _json.JSONDecodeError:
pass
if tool_call is None:
# No tool call -- this is the final answer
if verbose:
print(f' [Step {step+1}] Final answer' + chr(10))
return reply
# Execute the requested tool
tool_name = tool_call['tool']
tool_input = tool_call['input']
tool_fn = TOOLS.get(tool_name)
if tool_fn:
if verbose:
print(f' [Step {step+1}] Calling: {tool_name}({tool_input})')
result = tool_fn(tool_input)
if verbose:
preview = str(result).replace(chr(10), ' | ')[:150]
print(f' Result: {preview}...')
else:
result = f'Unknown tool: {tool_name}'
messages.append({'role': 'assistant', 'content': reply})
messages.append({'role': 'user', 'content': f'Tool result: {result}'})
return 'Agent exceeded maximum steps.'
print('Parking agent ready.')Ask the Agent Some Questions¶
Now we let the agent run end-to-end. For each question, the agent should: (1) figure out which tool to call based on the user’s natural language, (2) call find_parking_near_building with the user’s phrase (the fuzzy matcher handles informal names), and (3) format a helpful response with the ranked list, permit colors, and clickable Google Maps links. Try uncommenting the other questions to see how the agent handles visitor-specific queries, requests for the color legend, and different building names.
from IPython.display import display, Markdown
questions = [
'Where can I park near the business school?',
# Approximate/informal names all resolve to the right canonical building:
# 'Where can I park near the Ambler fitness center?',
# 'I want to go to the student gym, where should I park?',
# 'parking near Ambler gym',
# 'I need parking near the student union. What are my options?',
# 'parking close to the phog',
# 'basketball stadium parking',
# 'football stadium parking',
# 'What colors of parking are available at KU and what do they mean?',
# 'I am visiting KU and want to park near Allen Fieldhouse. Where should I go?',
]
for q in questions:
display(Markdown('---'))
display(Markdown(f'**Q:** {q}'))
answer = run_parking_agent(q)
# Render as Markdown so Google Maps URLs become clickable links.
# Colab auto-linkifies any URL inside a Markdown-rendered cell.
display(Markdown(answer))How to Build the Same Thing in Dify (multi-node workflow with real tools)¶
A proper Dify workflow uses alternating tool and LLM nodes, matching the architecture you see in typical agentic workflow diagrams. Each tool node fetches data, each LLM node reasons about what to do next with that data.
The workflow architecture¶
Start (user types a KU building name)
|
v
[TOOL #1] Knowledge Retrieval - KU_Buildings KB
| (acts as the "access Google Maps" step: returns the building's lat/lng)
v
[LLM #1] Parse the building coordinates
| (extracts a clean {lat, lng} from the retrieved text)
v
[TOOL #2] Knowledge Retrieval - KU_Parking_Lots KB
| (acts as the "access KU map" step: returns all parking lot data)
v
[LLM #2] Compute distances, filter to 2 miles, rank top 5, format output
| (does the math approximately using the coordinates)
v
End (returns formatted parking list with Google Maps pin URLs)Five processing nodes: 2 Knowledge Retrieval (tool) nodes + 2 LLM nodes + Start/End. No Code nodes, no custom APIs, no Python.
Why Knowledge Retrieval instead of HTTP Request?¶
In Dify’s workflow builder, a “tool” is any node that fetches external data. Knowledge Retrieval nodes are the easiest kind of tool because:
You upload the data once (buildings + parking lots as markdown files)
Dify chunks, embeds, and indexes it automatically
Retrieval is just a vector search — fast and free
No API keys, no rate limits, no billing
An alternative would be an HTTP Request node calling Nominatim (OpenStreetMap’s free geocoding API), but Nominatim is rate-limited and doesn’t know anything about specific KU building nicknames like “business school” or “the phog”. Our pre-uploaded Knowledge Base is more reliable for a classroom demo.
Approximate building matching in Dify¶
The Colab agent above uses an explicit alias table plus token-overlap scoring. Dify’s Knowledge Retrieval node does not have that — instead, approximate matching comes from vector embeddings: the user’s phrase is embedded and compared against embedded chunks from ku_buildings.md. For informal queries like “Ambler gym” or “the phog” to surface the correct chunk, the aliases must appear in the markdown itself.
That is why Agentic/dify_data/ku_buildings.md embeds each building’s common aliases directly in its Description: line (nicknames, school/department affiliations, department functions). For example, the Allen Fieldhouse entry mentions “basketball stadium”, “the Phog”, and “Phog Allen” explicitly so a vector search against “phog stadium” still retrieves that chunk. If you add a new nickname that users type, edit the markdown, re-upload (or sync) the Knowledge Base, and re-run the workflow.
If you ever find vector retrieval missing a phrase that the Colab agent handles correctly, either:
Add that phrase to the building’s Description line in
ku_buildings.md, orSwitch the Dify workflow to call the Supabase MCP endpoint from the sibling
ku_parking_mcp.ipynbnotebook — that endpoint uses the same explicit alias table as the Colab code cell above.
Step-by-step setup¶
Part A: upload the two data files as Knowledge Bases¶
The notebook directory includes two markdown files at Agentic/dify_data/:
ku_buildings.md- one section per building with coordinates and alias-rich descriptionsku_parking_lots.md- one section per parking lot with color and coordinates
In Dify:
Go to cloud.dify.ai -> Knowledge -> Create Knowledge
Upload
ku_buildings.md. Name the KBKU_Buildings.On the chunking screen, pick Custom chunking:
Chunk delimiter:
\n#(hash + space — this splits on every heading)Max chunk length: 500 tokens
This ensures each building becomes its own retrievable chunk, and the alias-rich description stays within that chunk.
Indexing: High Quality (vector embeddings).
Click Save & Process. Wait ~30 seconds.
Repeat: create a second Knowledge Base called
KU_Parking_Lotsand uploadku_parking_lots.mdwith the same chunking settings.
Part B: build the workflow¶
Studio -> Create App -> Workflow (not Chatflow).
Name it
KU Parking Assistant.On the canvas, click the Start node:
Add input variable: name =
destination, type = Text, label = “Which KU building are you visiting? (informal names like ‘the phog’, ‘Ambler gym’, ‘business school’ all work)”
Add TOOL #1: Knowledge Retrieval (Buildings)
Click + after Start -> Knowledge Retrieval
Query variable:
Start.destinationKnowledge: select
KU_BuildingsRename this node to
Retrieve Building
Add LLM #1: Parse Building Coordinates
Click + after Retrieve Building -> LLM
Model: your preferred (gpt-4o-mini, gemini-2.5-flash, etc.)
System prompt:
You are a data extractor. You will receive: 1. A user query that may use a nickname or informal name (e.g., "the phog", "Ambler gym", "student union", "business school"). 2. One or more retrieved KU building chunks, each containing a canonical building name, its coordinates, and a description that lists common aliases and school/department associations. Pick the single chunk whose canonical name or description best matches the user's query, and extract its canonical name and coordinates. If the query uses a nickname, use the alias list in the description to choose the correct building (e.g., "the phog" -> Allen Fieldhouse; "Ambler gym" or "student gym" -> Ambler Student Recreation Fitness Center; "school of business" -> Capitol Federal Hall). Reply with ONLY this JSON format (nothing else): {"building": "full canonical name", "lat": 38.9573, "lng": -95.2519}User prompt:
Retrieved building information: {{#Retrieve_Building.result#}} User query: {{#Start.destination#}}Rename this node to
Parse Coords
Add TOOL #2: Knowledge Retrieval (Parking Lots)
Click + after Parse Coords -> Knowledge Retrieval
Query variable:
Start.destination(or a fixed query like “parking lots”)Knowledge: select
KU_Parking_LotsIn the node settings, set Top K to 15 (to retrieve all lots, since our KB is small)
Rename this node to
Retrieve Parking
Add LLM #2: Distance + Filter + Format
Click + after Retrieve Parking -> LLM
System prompt:
You are a parking assistant. You will receive: 1. A destination building with its latitude and longitude (as JSON). 2. A list of KU parking lots with their coordinates and colors. Your job: - Compute the approximate great-circle distance in miles from the destination to each parking lot. For Lawrence KS, 0.001 degrees of latitude is about 0.069 miles and 0.001 of longitude is about 0.054 miles. - Filter to lots within 2 miles of the destination. - Rank the remaining lots from nearest to farthest. - Return the top 5 as a numbered list with these fields for each: * Rank, lot name, permit color * Distance in miles (2 decimals) and approximate walk time (3 mph walking speed) * Google Maps pin URL in this exact format: https://www.google.com/maps?q=LAT,LNG - End with a short note explaining what the permit color of the closest lot means. - If no lots are within 2 miles, say so and suggest the Park & Ride shuttle.User prompt:
Destination: {{#Parse_Coords.text#}} Available parking lots: {{#Retrieve_Parking.result#}} Give me the top 5 parking lots within 2 miles.Rename this node to
Rank and Format
Configure the End node
Output variable:
Rank_and_Format.text
Publish and click Run to test the approximate matcher end-to-end:
business school/school of business-> Capitol Federal Hallthe phog/basketball stadium/phog stadium-> Allen FieldhouseAmbler gym/student gym/fitness center-> Ambler SRFCthe union/student union-> Kansas Unionfootball stadium/ku football-> Memorial Stadium
What each node does and why¶
| Node | Type | Job | Deterministic or LLM? |
|---|---|---|---|
| Start | Input | Collect user destination (any informal phrase) | - |
| Retrieve Building | Tool (Knowledge Retrieval) | Vector search against buildings KB — finds the right chunk from an alias match | Deterministic (vector search) |
| Parse Coords | LLM | Picks the right canonical building from the retrieved chunks and extracts {lat, lng} | LLM (disambiguation + extraction) |
| Retrieve Parking | Tool (Knowledge Retrieval) | Look up parking lots from the Parking KB | Deterministic (vector search) |
| Rank and Format | LLM | Compute distances, filter 2 miles, rank, format | LLM (approximate math + formatting) |
| End | Output | Return the formatted result | - |
Notice the alternation: Tool -> LLM -> Tool -> LLM. Each tool fetches fresh data, each LLM reasons about what to do with it. This is the standard pattern in Dify workflows and maps directly to the architecture diagram you described.
Colab (real tools) vs. Dify (workflow tools) - the teaching contrast¶
| Colab notebook | Dify workflow | |
|---|---|---|
| How are tools defined? | Python functions | Dify nodes (Knowledge Retrieval, LLM, HTTP Request) |
| Who decides which tool to call? | The LLM, at each step of the agent loop | Fixed by the workflow diagram (same tools in the same order) |
| Approximate building matching | Explicit alias table + token-overlap scoring | Vector embeddings over alias-rich descriptions + LLM disambiguation |
| Distance math | Exact (Haversine formula) | Approximate (LLM reasoning over coordinates) |
| Data source | Python dicts | Dify Knowledge Bases with vector retrieval |
| Flexibility | High — agent can call tools in any order | Low — flow is fixed, but predictable |
| Best for | Complex questions with branching reasoning | Simple, predictable pipelines with the same steps every time |
When to pick each approach¶
Colab agent (this notebook): when the agent needs to decide dynamically whether to call
list_buildings,find_parking_near_building, orget_parking_colors_legendbased on the question. Good for conversational or exploratory use cases.Dify workflow (above): when every query follows the same steps (lookup destination -> lookup parking -> rank). Good for a production app where you want consistent, debuggable behavior without per-query reasoning overhead.
Limitations of the Dify version¶
The LLM’s distance math is approximate. For 15 lots in a small area it usually gets the top 3 right, but lots at very similar distances may be mis-ranked by 1-2 positions.
Approximate building matching depends on the aliases being present in the markdown. A nickname that isn’t written into
ku_buildings.mdwill only match if it’s semantically close enough for the embedding model to score highly — and that is less reliable than the explicit alias table in the Colab agent.If your Knowledge Base grows beyond a few hundred entries, the vector retrieval step may miss relevant items. At that scale you’d switch to HTTP Request calls to a real database, or to the MCP endpoint in
ku_parking_mcp.ipynb.The 2-mile filter is enforced in the LLM prompt, not in code. A rogue LLM response could include a lot outside 2 miles. In production you’d add a Template Transform or Parameter Extractor node to enforce the filter programmatically.
Exercises¶
Add a new building (e.g.,
'Budig Hall': (lat, lng)) and a new parking lot, then rerun — notice you don’t need to change any tool code.Add a fourth tool:
find_parking_by_color(color, near_building)that filters to only lots of a specific permit color.Replace the hardcoded data with a real parse of the KU parking map PDF — this is where real data engineering begins.
In Dify, try removing the coordinates from the system prompt and just listing lot names. How does answer quality change? This shows why giving the LLM structured numeric data matters.
Compare the Colab and Dify outputs for the same query. How often do they agree on the top-3 ranking?
Key takeaways¶
Tool-using agents combine LLM strengths (fuzzy language understanding) with Python strengths (deterministic math and lookups), letting each side do what it does best.
The Haversine formula gives exact great-circle distances in microseconds, which is far more reliable than asking the LLM to eyeball coordinate differences.
Alias tables plus token-overlap scoring resolve informal phrases like “the phog” or “student gym” to canonical building names before any tool runs.
A prompt-based ReAct loop -- emit JSON action, run the tool, feed the result back -- is all you need to build a working agent without a framework.
Dify workflows are the fixed-flow counterpart: alternating Knowledge Retrieval and LLM nodes give predictable behavior, while the Python agent offers dynamic tool selection.
Run the code¶
To run this notebook, copy the URL below into your browser’s address bar. The link opens the notebook directly in Google Colab. (If your PDF viewer makes the URL clickable and lands on a broken page, copy the full text manually -- the viewer may have truncated the link at a line break.)
Estimated run time: ~2 minutes (1-3 LLM calls per question)
https://colab.research.google.com/github/KarAnalytics/code_demos/blob/main/KU_Parking_Assistant.ipynb