This is the contrast notebook to MultiAgent_DB_multi_human_input.ipynb. It follows the same workflow — upload an Excel file, infer a schema, build a SQLite database, then answer business questions with SQL — but with one critical difference:
One agent persona does all the work. There is no “Data Architect” agent, “SQL Developer” agent, “SQL Query Writer” agent, etc. Instead, a single generalist “Data Analyst” uses the same system prompt for every task: designing the schema, writing DDL, and writing SQL queries.
What we keep from the multi-agent notebook:
Excel file input → sheets inferred as SQL tables
SQLite database is actually created and populated
Phase 1 (one-time setup) → Phase 2 (repeatable business queries)
SQL generation and execution for each query
What we remove:
No separate specialized agents — just one agent wearing multiple “hats”
No RAG approach (Agent 4 in the multi-agent version)
No evaluator / similarity scorer (Agent 5 in the multi-agent version)
One answer per query (from SQL), not two
Why build this contrast? To see whether the multi-agent specialization actually helps, or whether one well-prompted agent can do the same job.
Every LLM call here goes through the llm_cascade wrapper introduced in Chapter 5, so the agent transparently falls back across providers as quotas or keys change.
# 1. INSTALL DEPENDENCIES
!pip install -q gradio pandas openpyxl git+https://github.com/KarAnalytics/llm_cascade.git
The cell above installs the necessary libraries. The llm_cascade package is a lightweight wrapper that automatically detects which LLM API keys you have configured and falls back through providers in priority order -- Gemini first, then Groq, HuggingFace, Cohere, and others. This means the notebook works regardless of which free-tier provider you have access to, and it gracefully handles rate limits by trying the next provider in line.
Single-Agent System: One Agent Handles Everything¶
In the multi-agent notebook, we split the work across five specialized personas: a Data Architect to design the schema, a SQL Developer to write DDL, a SQL Query Writer to answer questions, a RAG Analyst for a second opinion, and an Evaluator to compare the two. It is an elegant architecture, but it raises a natural question: is all that specialization actually necessary? Could one well-prompted generalist agent do the same job, faster and cheaper?
This notebook answers that question by building the exact same workflow -- upload an Excel file, infer a schema, build a SQLite database, then answer business questions with SQL -- but with a single “Data Analyst” agent that wears all the hats. There is one system prompt shared across every task. The agent does not know it is playing multiple roles; it simply follows whatever user-prompt instructions it receives.
The key design insight is that the system prompt defines the agent’s identity, while the user prompt defines the task. By keeping one system prompt and varying the user prompt, we get the simplicity of a single agent without sacrificing the ability to perform diverse tasks. The tradeoff is transparency: with five agents, you can inspect each one’s output independently; with one agent, you see only the final result.
What we keep from the multi-agent notebook:
Excel file input with sheets inferred as SQL tables
SQLite database is actually created and populated
Phase 1 (one-time setup) followed by Phase 2 (repeatable business queries)
What we remove:
No separate specialized agents -- just one agent wearing multiple “hats”
No RAG approach (Agent 4 in the multi-agent version)
No evaluator / similarity scorer (Agent 5 in the multi-agent version)
One answer per query (from SQL), not two
Imports and Setup¶
Same libraries as the multi-agent notebook, minus chromadb (we’re not building a RAG system). We use pandas to read Excel, sqlite3 for the database, and llm_cascade for automatic LLM provider fallback.
import os
import sqlite3
import pandas as pd
from llm_cascade import get_cascade
llm = get_cascade()
DB_NAME = 'single_agent_database.db'
The Single Agent: Data Analyst¶
Role: A generalist Data Analyst that can design schemas, write DDL, and write SQL queries — all tasks previously split across four different specialized agents in the multi-agent notebook.
Key design decision: All three tasks below use the same ANALYST_SYSTEM_PROMPT. The system prompt describes the agent’s overall role; only the user prompt changes per task. This is what makes it a “single agent” — there’s one persona, not five.
Tasks the single agent performs:
| Task | Function | Replaces |
|---|---|---|
| Design logical schema | analyst_design_schema() | Agent 1 (Data Architect) |
| Write SQL DDL | analyst_write_ddl() | Agent 2 (SQL Developer) |
| Write SQL query for a question | analyst_write_sql_query() | Agent 3 (SQL Query Writer) |
Compare the brevity of this section to the five separate agent cells in the multi-agent notebook.
# One system prompt shared across all analyst tasks
ANALYST_SYSTEM_PROMPT = (
'You are a skilled Data Analyst. You can design logical database schemas, '
'write SQL DDL statements, and write SQL queries to answer business questions. '
'Follow the user instructions precisely and output only what is requested.'
)
def read_excel_to_map(file_path):
'''Read an Excel file into a dict of {sheet_name: DataFrame}. Not an LLM call.'''
xl = pd.ExcelFile(file_path)
data_map = {}
for sheet in xl.sheet_names:
data_map[sheet] = pd.read_excel(file_path, sheet_name=sheet)
return data_map
def analyst_design_schema(data_map):
'''Task 1: ask the agent to propose a logical schema from the sheets.'''
sheet_summary = ''
for sheet, df in data_map.items():
cols = ', '.join([f'{c} ({dtype})' for c, dtype in zip(df.columns, df.dtypes)])
sheet_summary += f'Sheet Name: {sheet}' + chr(10) + f'Columns detected: {cols}' + chr(10) + chr(10)
prompt = (
'Analyze these sheets and provide a logical database schema. '
'Use ONLY bullet points. Identify suggested Primary Keys and Foreign Keys:' + chr(10)
+ sheet_summary
)
response = llm.generate(prompt, system_prompt=ANALYST_SYSTEM_PROMPT)
return response.text
def analyst_write_ddl(logical_schema):
'''Task 2: ask the agent to turn the logical schema into SQL CREATE TABLE statements.'''
prompt = (
'Based on this logical schema, write SQLite CREATE TABLE statements. '
'Return ONLY raw SQL. No explanation, no markdown fences.' + chr(10)
+ logical_schema
)
response = llm.generate(prompt, system_prompt=ANALYST_SYSTEM_PROMPT)
return response.text.replace('```sql', '').replace('```', '')
def analyst_write_sql_query(question, logical_schema):
'''Task 3: ask the agent to write a SQL SELECT for a business question.'''
prompt = (
'Schema:' + chr(10) + logical_schema + chr(10)
+ 'Question: ' + question + chr(10)
+ 'Generate ONLY the SQL SELECT statement. Table names might have underscores. '
+ 'No explanation, no markdown fences.'
)
response = llm.generate(prompt, system_prompt=ANALYST_SYSTEM_PROMPT)
return response.text.replace('```sql', '').replace('```', '').strip()
print('Single agent (Data Analyst) ready. One persona, three tasks.')
Interactive App: Upload Excel and Ask Questions¶
The cell below launches a Gradio app that gives you an interactive UI for the single agent:
Step 1: Upload your Excel file and click Build Database. The single agent reads the sheets, designs a logical schema, writes the DDL, and creates a SQLite database.
Step 2: Type a business question and click Submit Query (or press Enter). The same agent writes a SQL query, executes it, and shows the result. Your question and the answer are added to the chat history.
Repeat Step 2 as many times as you want — no need to re-upload the file.
If you do not have an Excel file on hand, download this sample dataset and upload it: SUPPLIER_PARTS.xlsx (click Download raw file on the GitHub page).
Compare with the multi-agent notebook: that notebook runs 3 LLM calls per query (SQL + RAG + Evaluator). This one runs just 1 LLM call per query. You should notice it feels faster, and the generated SQL is often very similar.
import gradio as gr
# Module-level state so the Gradio UI can persist the schema across queries
_app_state = {'logical_schema': None, 'data_map': None, 'db_ready': False}
def ui_build_database(file):
'''Phase 1: run the single agent on the uploaded file to build the DB.'''
if file is None:
return '', '', 'Please upload an Excel file first.'
try:
data_map = read_excel_to_map(file.name)
logical_schema = analyst_design_schema(data_map)
ddl = analyst_write_ddl(logical_schema)
if os.path.exists(DB_NAME):
os.remove(DB_NAME)
conn = sqlite3.connect(DB_NAME)
try:
conn.executescript(ddl)
for table, df in data_map.items():
clean = table.replace(' ', '_')
df.to_sql(clean, conn, if_exists='replace', index=False)
conn.commit()
finally:
conn.close()
_app_state['logical_schema'] = logical_schema
_app_state['data_map'] = data_map
_app_state['db_ready'] = True
return logical_schema, ddl, 'Database ready. You can now ask questions below.'
except Exception as e:
return '', '', f'Error building database: {e}'
def ui_ask_question(question, history):
'''Phase 2: run the single agent on a single question.'''
if not _app_state['db_ready']:
bot_msg = 'Please upload an Excel file and click Build Database first.'
return history + [[question, bot_msg]], ''
if not question or not question.strip():
return history, question
try:
sql_query = analyst_write_sql_query(question, _app_state['logical_schema'])
conn = sqlite3.connect(DB_NAME)
try:
result_df = pd.read_sql_query(sql_query, conn)
sql_result = result_df.to_string()
except Exception as e:
sql_result = f'SQL Execution Error: {e}'
finally:
conn.close()
response = '**Generated SQL:**' + chr(10) + '```sql' + chr(10) + sql_query + chr(10) + '```' + chr(10) + chr(10) + '**Result:**' + chr(10) + '```' + chr(10) + sql_result + chr(10) + '```'
except Exception as e:
response = f'Error: {e}'
return history + [[question, response]], ''
with gr.Blocks(title='Single Agent DB') as demo:
gr.Markdown('# Single Agent: Excel to SQLite to Natural-Language Queries')
gr.Markdown('### Step 1: Upload Excel file and build the database')
with gr.Row():
file_input = gr.File(label='Excel File', file_types=['.xlsx', '.xls'])
build_btn = gr.Button('Build Database', variant='primary')
with gr.Row():
with gr.Column():
out_schema = gr.Markdown(label='Logical Schema (agent output)')
with gr.Column():
out_ddl = gr.Code(label='SQL DDL (agent output)', language='sql')
out_status = gr.Textbox(label='Status', interactive=False)
build_btn.click(
ui_build_database,
inputs=[file_input],
outputs=[out_schema, out_ddl, out_status],
)
gr.Markdown('### Step 2: Ask business questions (repeat as needed)')
chatbot = gr.Chatbot(label='Q&A History', height=400)
query_input = gr.Textbox(
label='Your question',
placeholder='e.g., How many total sales? Who is the top employee?',
)
submit_btn = gr.Button('Submit Query', variant='primary')
submit_btn.click(
ui_ask_question,
inputs=[query_input, chatbot],
outputs=[chatbot, query_input],
)
query_input.submit(
ui_ask_question,
inputs=[query_input, chatbot],
outputs=[chatbot, query_input],
)
demo.launch(debug=False)
Single-Agent vs. Multi-Agent: The Tradeoffs¶
This notebook solves the same problem as MultiAgent_DB_multi_human_input.ipynb but with one agent persona instead of five specialized ones. Here’s the comparison:
| Aspect | Single-Agent (this notebook) | Multi-Agent |
|---|---|---|
| Number of agent personas | 1 (one system prompt) | 5 (five specialized system prompts) |
| LLM calls per Phase 1 setup | 2 (schema + DDL) | 2 (schema + DDL) |
| LLM calls per business query | 1 (SQL only) | 3 (SQL + RAG + Evaluator) |
| SQLite database created? | Yes | Yes |
| RAG approach included? | No | Yes (Agent 4) |
| Cross-check / evaluator? | No | Yes (Agent 5) |
| Code complexity | Lower (one system prompt to maintain) | Higher (five system prompts) |
| Verifiability | Same as multi-agent (SQL is inspectable) | Same |
| Speed per query | ~3x faster (1 call vs. 3) | Slower |
| Cost per query | ~3x cheaper (1 call vs. 3) | Higher |
| Catches LLM mistakes? | No (only one answer) | Yes (Agent 5 compares SQL vs. RAG) |
When is the specialization worth it?¶
Single-agent wins when:
You trust the SQL output (simple queries, good LLM)
Speed and cost matter more than cross-validation
You just want answers, not multiple perspectives
Multi-agent wins when:
You need a sanity check (the evaluator agent catches when SQL and RAG disagree)
Specialization genuinely produces better outputs (e.g., a dedicated “SQL expert” prompt may write better queries than a generalist)
The workflow is complex enough that debugging individual steps is valuable
You want to audit where each answer came from
The honest lesson¶
For this particular problem, the single-agent approach is surprisingly competitive. The multi-agent architecture adds value mainly through cross-checking (Agent 5), not through specialization of the SQL-writing step. In practice, the same LLM with the same prompt writes similar SQL whether you call it “SQL Developer Agent” or “Data Analyst Agent.”
A common pattern: start with one agent; add more agents only when you discover a specific failure mode that specialization (or cross-checking) would fix. Don’t build five agents upfront just because frameworks encourage it.
Exercises¶
Run the same question through both notebooks and compare the generated SQL. Does the specialization help?
Try a question where the LLM writes buggy SQL. Does the multi-agent evaluator catch it?
Time both approaches. What’s the latency difference per query?
Add a second answer path to this notebook (e.g., a RAG approach) so you get the evaluator benefit without splitting into multiple agent personas.
Key takeaways¶
One generalist agent with a single system prompt can handle schema design, DDL generation, and SQL query writing that the multi-agent version split across five personas.
System prompt defines identity, user prompt defines task -- varying the user prompt alone is often enough to switch between data-architect, DBA, and analyst behaviors.
Fewer LLM calls (one per business query instead of three) makes the single-agent version roughly 3x faster and cheaper per question.
What you lose is cross-checking: with only one answer path, there is no evaluator agent to catch buggy SQL or flag disagreements between approaches.
Start simple -- add specialized agents only when you observe a concrete failure mode that specialization or cross-validation would fix.
Run the code¶
To run this notebook, copy the URL below into your browser’s address bar. The link opens the notebook directly in Google Colab. (If your PDF viewer makes the URL clickable and lands on a broken page, copy the full text manually -- the viewer may have truncated the link at a line break.)
Estimated run time: ~3 minutes (requires Excel file upload)
https://colab.research.google.com/github/KarAnalytics/code_demos/blob/main/SingleAgent_DB.ipynb