In every agent notebook so far, a human has decided the workflow. The multi-agent system runs Agents 1 through 5 in a fixed order. The single-agent system runs schema design, then DDL, then queries. Even the tool-calling parking assistant follows a human-designed ReAct loop. The agent may choose which tool to call at each step, but the overall structure -- “answer one question using tools” -- is predefined.
This notebook breaks that pattern. Here, the agent receives only a one-sentence goal (“Validate this business idea”) and figures out everything else on its own: what questions to research, how to answer each one, how to synthesize the findings into a recommendation, and whether that recommendation is actually any good. No human intervenes between phases. No pipeline is hardcoded. The agent plans, executes, synthesizes, and self-critiques -- autonomously.
What makes this “autonomous” vs. the previous agents?
| Tool-calling agent | Multi-agent system | Autonomous workflow (this) | |
|---|---|---|---|
| Who decides the task steps? | Agent picks tools at each step | Humans pre-define the pipeline | Agent plans the whole workflow |
| Fixed or dynamic pipeline? | Dynamic (tool-by-tool) | Fixed (Agent 1 -> 2 -> 3...) | Dynamic (plan changes per input) |
| Self-critique? | No | Only if an Evaluator agent is added | Yes, as a built-in phase |
| Typical use case | Answer questions using external tools | Well-defined workflows (ETL, BI) | Open-ended goals (research, writing, validation) |
Why self-critique matters: Without it, the agent would produce a recommendation and stop, with no indication of whether the recommendation is strong or weak. The Reflect phase acts as a built-in skeptic -- it identifies gaps in the reasoning, flags unsupported assumptions, and assigns a confidence score. This gives the human consumer a calibrated signal: a confidence of 8/10 means “probably act on this”; a confidence of 4/10 means “do more research before deciding.” In production systems, you could feed a low-confidence critique back into the planner for another iteration, creating a self-improving loop.
The contrast is direct: in Chapter 16’s multi-agent and single-agent notebooks a human fixes the pipeline, while here the LLM plans it from a single-sentence goal.
# 1. INSTALL DEPENDENCIES
!pip install -q git+https://github.com/KarAnalytics/llm_cascade.git
IGNORE THE ABOVE ERRORS AND PROCEED — they are specific to Google Colab’s preinstalled packages and the code will still execute correctly.
Imports and Setup¶
Minimal dependencies: just llm_cascade for automatic LLM provider fallback. No tools, no databases, no vector stores — this is a pure LLM reasoning workflow.
from llm_cascade import get_cascade
llm = get_cascade()
The Four Phases of the Autonomous Workflow¶
Each phase is a single function that makes one LLM call with a phase-specific system prompt. Together, the four phases form the Plan-Execute-Synthesize-Reflect loop -- a pattern that appears in many real-world autonomous agent frameworks (AutoGPT, BabyAGI, LangGraph research agents).
Notice that each phase has a different system prompt, but this is not the same as the multi-agent architecture. In the multi-agent notebook, each agent has a different role (Data Architect, SQL Developer). Here, each phase has a different cognitive task: the Planner needs to think strategically about what questions matter; the Executor needs to think concretely about answering one question at a time; the Synthesizer needs to weigh tradeoffs and make a judgment call; and the Critic needs to be genuinely skeptical, looking for holes in the reasoning. These are four modes of thinking, not four job titles.
Plan -- Given the goal, generate 4-5 specific questions that need to be answered
Execute -- Loop through the plan and answer each question, passing prior answers as accumulating context
Synthesize -- Combine all answers into a final recommendation with a clear go/no-go/conditional verdict
Reflect -- Critique the recommendation, identify weaknesses, and assign a confidence score from 1 to 10
# Phase 1: Planner -- decides WHAT needs to be researched
PLANNER_SYSTEM = (
'You are a strategic business planner. Given a high-level goal, break it down '
'into the 4-5 most critical questions that must be answered to achieve it. '
'Output ONLY a numbered list of questions. No preamble, no conclusion.'
)
def plan(goal):
'''Phase 1: generate a research plan as a list of questions.'''
prompt = f'Goal: {goal}' + chr(10) + chr(10) + 'List the 4-5 most critical questions to answer.'
response = llm.generate(prompt, system_prompt=PLANNER_SYSTEM)
# Parse the numbered list into a Python list of strings
steps = []
for line in response.text.split(chr(10)):
line = line.strip()
if not line:
continue
# Strip leading '1.', '1)', '-', '*', etc.
cleaned = line.lstrip('0123456789.)- *').strip()
if cleaned and len(cleaned) > 5:
steps.append(cleaned)
return steps
print('Phase 1 (Planner) ready.')
The Planner’s system prompt instructs it to think like a strategic consultant: given a vague goal, identify the critical questions that will determine success or failure. The output is a numbered list of 4-5 questions -- no preamble, no analysis, just the questions. This constrained output format makes it easy to parse programmatically and feed into the next phase. Try changing the business idea later and notice how the plan changes completely -- an AI housing app generates questions about data sources and student demographics, while a meal kit service generates questions about supply chains and unit economics.
The Executor answers one research question at a time, but with an important twist: it receives all prior answers as accumulated context. This means each successive answer can build on the previous ones, creating a chain of reasoning rather than a collection of isolated answers. The second question’s answer might reference a data source mentioned in the first answer; the fourth might synthesize patterns from the first three. This context accumulation is what makes the execution phase feel coherent rather than fragmented.
The Synthesizer plays the role of a senior executive who must make a decision based on the research. Its system prompt asks for a clear verdict -- go, no-go, or conditional -- followed by three bullet points of key reasoning. This forces the model to commit to a position rather than hedging endlessly, which is a common failure mode when LLMs are asked for recommendations without explicit structural constraints.
The Critic is deliberately adversarial. Its system prompt instructs it to be “skeptical” and to identify what is “weak or missing.” This is by design: the Synthesizer is optimistic (it just made a recommendation and wants it to be right), so the Critic provides the necessary counterbalance. In the example output you will see below, the Critic often assigns a much lower confidence score than you might expect from reading the recommendation alone, because it focuses on risks, missing evidence, and competitive threats that the Synthesizer glossed over.
# Phase 2: Executor -- answers ONE research question at a time
EXECUTOR_SYSTEM = (
'You are a business analyst. Answer the question concisely (2-4 sentences) '
'with concrete reasoning. Use the prior context if helpful, but stay focused '
'on the current question.'
)
def execute_step(question, prior_context):
'''Phase 2: answer one research question with access to prior answers.'''
prompt = 'Prior context:' + chr(10) + prior_context + chr(10) + chr(10) + 'Question: ' + question
response = llm.generate(prompt, system_prompt=EXECUTOR_SYSTEM)
return response.text
print('Phase 2 (Executor) ready.')
# Phase 3: Synthesizer -- combines findings into a recommendation
SYNTHESIZER_SYSTEM = (
'You are a senior executive. Given research findings, synthesize them into a '
'clear, decisive recommendation. State the recommendation (go / no-go / conditional), '
'followed by 3 bullet points of key reasoning.'
)
def synthesize(goal, research_results):
'''Phase 3: combine all research answers into a final recommendation.'''
findings = chr(10).join([f'Q: {q}' + chr(10) + f'A: {a}' for q, a in research_results])
prompt = (
'Goal: ' + goal + chr(10) + chr(10)
+ 'Research findings:' + chr(10) + findings + chr(10) + chr(10)
+ 'Synthesize into a final recommendation.'
)
response = llm.generate(prompt, system_prompt=SYNTHESIZER_SYSTEM)
return response.text
print('Phase 3 (Synthesizer) ready.')# Phase 4: Critic -- self-reflects on the recommendation
CRITIC_SYSTEM = (
'You are a skeptical business critic. Given a recommendation, identify '
'what is weak or missing, what is strong, and assign a confidence level '
'from 1 (very weak) to 10 (very strong). Be concise.'
)
def reflect(recommendation):
'''Phase 4: critique the recommendation and assign confidence.'''
prompt = 'Recommendation to critique:' + chr(10) + recommendation
response = llm.generate(prompt, system_prompt=CRITIC_SYSTEM)
return response.text
print('Phase 4 (Critic) ready.')
Inspect the Full Trace¶
The result dictionary holds everything the agent produced across all four phases, so you can examine each phase individually. This is useful for debugging (did the planner ask the right questions? did the executor miss something?) and for teaching (trace how the agent’s reasoning flowed from plan to recommendation to critique).
The trace output reveals something important about autonomous workflows: the quality of the final recommendation depends heavily on the quality of the plan. If the planner asks shallow questions, the executor produces shallow answers, and the synthesizer builds on a weak foundation. This is why the planner’s system prompt matters so much -- it sets the ceiling for everything that follows.
Run the Agent¶
Edit the business_idea string below and run the cell. The agent will autonomously plan its research, execute each step, synthesize a recommendation, and self-critique -- with no further input from you. The verbose output prints each phase as it completes, so you can watch the agent “think out loud” and trace how the plan’s questions get answered, how those answers feed into the recommendation, and how the critic pokes holes in the final output.
Try different ideas to see how the plan changes. The agent generates a completely different research plan for a fintech startup vs. a coffee shop vs. a SaaS tool -- which is exactly what an autonomous agent should do. The plan is not hardcoded; it emerges from the LLM’s reasoning about what matters for each specific goal.
def run_autonomous_workflow(goal, verbose=True):
'''The full autonomous workflow: Plan -> Execute -> Synthesize -> Reflect.'''
if verbose:
print('=' * 70)
print('GOAL:', goal)
print('=' * 70)
# --- Phase 1: Plan ---
if verbose:
print(chr(10) + '[PHASE 1 - PLANNING]')
steps = plan(goal)
if verbose:
print(f'Agent decided to research {len(steps)} questions:')
for i, s in enumerate(steps, 1):
print(f' {i}. {s}')
# --- Phase 2: Execute each step ---
if verbose:
print(chr(10) + '[PHASE 2 - EXECUTION]')
results = []
prior_context = ''
for i, step in enumerate(steps, 1):
if verbose:
print(chr(10) + f'Step {i}: {step}')
answer = execute_step(step, prior_context)
results.append((step, answer))
prior_context += chr(10) + f'Q: {step}' + chr(10) + f'A: {answer}'
if verbose:
print(f' -> {answer}')
# --- Phase 3: Synthesize ---
if verbose:
print(chr(10) + '[PHASE 3 - SYNTHESIS]')
recommendation = synthesize(goal, results)
if verbose:
print(recommendation)
# --- Phase 4: Reflect ---
if verbose:
print(chr(10) + '[PHASE 4 - SELF-CRITIQUE]')
critique = reflect(recommendation)
if verbose:
print(critique)
return {
'goal': goal,
'plan': steps,
'research': results,
'recommendation': recommendation,
'critique': critique,
}
print('Autonomous workflow function ready.')
business_idea = 'An AI-powered app that helps college students find affordable off-campus housing by predicting rent trends.'
# business_idea = 'A subscription meal kit service for busy professionals that uses local ingredients.'
# business_idea = 'A mobile game that teaches kids basic accounting through story-based puzzles.'
result = run_autonomous_workflow(business_idea)
Inspect the Full Trace¶
The result dict holds everything the agent did, so you can examine each phase individually. This is useful for debugging or for teaching how the agent’s reasoning flowed from plan to final recommendation.
Key Takeaways¶
What makes this workflow “autonomous”:
You provide only a high-level goal, not a list of steps
The agent generates its own research plan based on the goal
Each step is executed with access to the prior context (results accumulate)
The agent self-critiques its output, giving you a confidence signal for free
What’s NOT happening (and why this example is simple):
No external tools (no web search, no database queries)
No iteration: the agent doesn’t re-plan based on the critique
No parallelism: execution is strictly sequential
No human-in-the-loop: the workflow runs end-to-end without interruption
Scaling this up in the real world:
Add tools so execution steps can fetch real data (web search, database, APIs)
Add an iteration loop: if the critic’s confidence is low, feed the critique back to the planner to generate a better plan
Use LangGraph to express this as a stateful graph with conditional edges (see
LangGraph_demo.ipynbfor the foundation)Add memory so the agent can remember prior runs and improve over time
Comparison with our other notebooks:
| Notebook | Who decides the steps? |
|---|---|
LlamaIndex_RAG, LangChain_demo (RAG chain) | Humans — the pipeline is fixed |
LangChain_demo (agent with tools) | LLM picks tools at each turn, but the goal is one question |
MultiAgent_DB | Humans — 5 specialized agents run in a fixed order |
SingleAgent_DB | Humans — one agent runs a fixed workflow |
| This notebook | LLM decides the whole workflow from a one-sentence goal |
Exercises:
Try a business idea the LLM is unlikely to have strong prior knowledge about (e.g., a niche B2B tool). Does the plan still look reasonable?
Modify
run_autonomous_workflowto loop: after the critic phase, if confidence < 7, re-run the plan with the critique appended to the goal.Add a fifth phase that outputs a one-page marketing pitch based on the recommendation.
Replace the
PLANNER_SYSTEMprompt with a different persona (e.g., “venture capitalist”) and see how the plan changes.
Run the code¶
To run this notebook, copy the URL below into your browser’s address bar. The link opens the notebook directly in Google Colab. (If your PDF viewer makes the URL clickable and lands on a broken page, copy the full text manually -- the viewer may have truncated the link at a line break.)
Estimated run time: ~3 minutes (7-8 LLM calls total)
https://colab.research.google.com/github/KarAnalytics/code_demos/blob/main/AutonomousAgent_BusinessValidator.ipynb