This notebook demonstrates how LlamaIndex simplifies building RAG pipelines. Instead of manually chunking documents, computing embeddings, and querying vector stores, LlamaIndex handles the plumbing so you can focus on the application logic.
How it works:
Create a small set of synthetic company documents (no external data needed).
Use LlamaIndex to ingest, chunk, embed, and index the documents in one step.
Given a natural-language question, LlamaIndex retrieves relevant chunks and sends them to an LLM.
Compare answers with RAG (LlamaIndex pipeline) vs. without RAG (direct LLM call).
Learning goals:
Understand what a RAG framework (LlamaIndex) does under the hood
See how LlamaIndex abstracts document loading, chunking, embedding, and retrieval
Compare the framework approach to the manual RAG we built in earlier notebooks
Observe how RAG grounding eliminates hallucination on private/synthetic data
Provider setup: This notebook uses the llm_cascade package, which auto-detects your API keys and falls back to the next provider if one is unavailable. Supported providers: OpenAI, Gemini, Ollama, Grok (xAI), Groq, HuggingFace, Cohere, OpenRouter.
Store any of these API keys in Colab Secrets (or a local .env file):
OPENAI_API_KEY, GEMINI_API_KEY, OLLAMA_API_KEY, XAI_API_KEY, GROQ_API_KEY, HF_TOKEN, COHERE_API_KEY, OPENROUTER_API_KEY
Think of LlamaIndex as a convenience wrapper around the seven manual steps you wrote out in Chapter 8 and the ChromaDB/LanceDB/Pinecone pipelines from Chapter 9 — same chunking, same embeddings, same retrieval, just expressed in two or three lines instead of fifty.
!pip install -q -U llama-index llama-index-llms-openai-like llama-index-llms-gemini llama-index-embeddings-huggingface llama-index-readers-github google-genai openai git+https://github.com/KarAnalytics/llm_cascade.git sentence-transformers1) Imports and Provider Helpers (7-Vendor Cascade)¶
We configure LlamaIndex to use whichever LLM provider you have available. The cascade tries each provider in order and falls through on quota errors, ensuring the notebook works regardless of which API keys you have set up.
For embeddings, we use a local HuggingFace model (all-MiniLM-L6-v2) so that embedding always works regardless of which LLM API key you have. This is a deliberate design choice: by keeping the embedding model local, we avoid the situation where a rate-limited API blocks the entire pipeline. The LLM is the only component that needs an external API call.
from pathlib import Path
from llm_cascade.providers import PROVIDERS, _load_env, _get_key, _is_retriable_error
_load_env()
def get_available_providers():
return [p for p in PROVIDERS if _get_key(p['key_env'])]
def has_llm_provider():
return len(get_available_providers()) > 0
# Print status
available = get_available_providers()
if available:
print('Providers configured (in fallback order):')
for p in available:
print(f" + {p['name']:<16} model = {p['default_model']}")
else:
print('WARNING: No API keys found.')WARNING: No API keys found.
2) Configure LlamaIndex LLM and Embeddings¶
LlamaIndex needs two things configured before it can build a RAG pipeline:
An LLM for generating answers (we pick the first available provider from our cascade)
An embedding model for converting text to vectors (we use a local HuggingFace model -- no API key needed)
This cell configures LlamaIndex’s global Settings object so all downstream components use our chosen models. The CascadeLLM wrapper you see below is necessary because LlamaIndex expects an LLM object that conforms to its own interface, but our llm_cascade package has its own API. The wrapper bridges the two, translating LlamaIndex’s complete() calls into llm_cascade’s generate() calls. Without this adapter, we would be locked into a single provider and lose the automatic fallback behavior that makes these notebooks robust across different API key configurations.
from llama_index.core import Settings
from llama_index.core.llms import CustomLLM, LLMMetadata, CompletionResponse, CompletionResponseGen
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llm_cascade import get_cascade
from typing import Any
# ---- Embedding model (local, no API key needed) -----------------------------
Settings.embed_model = HuggingFaceEmbedding(model_name="sentence-transformers/all-MiniLM-L6-v2")
print("Embedding model: sentence-transformers/all-MiniLM-L6-v2 (local)")
# ---- Custom LlamaIndex LLM that wraps llm_cascade for automatic fallback ----
class CascadeLLM(CustomLLM):
"""LlamaIndex LLM that delegates to llm_cascade (8-provider fallback)."""
context_window: int = 8000
num_output: int = 1024
model_name: str = "llm_cascade"
@property
def metadata(self) -> LLMMetadata:
return LLMMetadata(
context_window=self.context_window,
num_output=self.num_output,
model_name=self.model_name,
)
def complete(self, prompt: str, **kwargs: Any) -> CompletionResponse:
cascade = get_cascade(verbose=False)
response = cascade.generate(prompt)
return CompletionResponse(text=response.text, additional_kwargs={"provider": response.provider, "model": response.model})
def stream_complete(self, prompt: str, **kwargs: Any) -> CompletionResponseGen:
# Non-streaming fallback
resp = self.complete(prompt, **kwargs)
def gen():
yield resp
return gen()
Settings.llm = CascadeLLM()
print("LLM: CascadeLLM (auto-fallback across available providers)")
BertModel LOAD REPORT from: sentence-transformers/all-MiniLM-L6-v2
Key | Status | |
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED | |
Notes:
- UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
Embedding model: sentence-transformers/all-MiniLM-L6-v2 (local)
LLM: CascadeLLM (auto-fallback across available providers)
3) Create Synthetic Company Documents¶
We generate fictional documents for three companies that the LLM has never seen in training. This ensures that correct answers can only come from RAG retrieval, not from parametric memory. Each company’s documents live in a separate directory, simulating real-world data silos.
| Directory | Company | Documents |
|---|---|---|
acme_docs/ | Acme Analytics Inc. (BI tools, Lawrence KS) | Overview, Products, Financials, Handbook, Case Study |
globex_docs/ | Globex Cybersecurity Corp. (cybersecurity, Boston) | Overview, Products, Incident Report |
nextera_docs/ | Nextera Green Solutions (sustainability consulting, Denver) | Overview, Services, Impact Report |
Later, we will use LlamaIndex Data Loaders (SimpleDirectoryReader) to ingest all three directories — demonstrating how LlamaIndex can pull information from multiple, separate data sources and unify them in a single searchable index.
DOCUMENTS = {
"company_overview.txt": """ACME ANALYTICS INC. — COMPANY OVERVIEW
Acme Analytics Inc. was founded in 2019 by Dr. Sarah Chen and Marcus Rivera
in Lawrence, Kansas. The company specializes in AI-powered business intelligence
tools for mid-market companies (100–5,000 employees).
Headquarters: 1420 Jayhawk Boulevard, Lawrence, KS 66045
Employees: 287 (as of January 2026)
Annual Revenue (2025): $42.3 million
Funding: Series B ($18M raised in March 2023 from Midwest Ventures)
Mission: "To democratize data analytics so every business decision is informed
by evidence, not intuition."
The company operates three offices: Lawrence (HQ), Chicago (sales), and
Austin (engineering). CEO: Dr. Sarah Chen. CTO: Marcus Rivera. CFO: Linda Park.""",
"products_and_services.txt": """ACME ANALYTICS — PRODUCTS AND SERVICES
1. InsightBoard Pro (Flagship Product)
- Real-time dashboard platform with natural language query interface
- Pricing: $49/user/month (Standard), $89/user/month (Enterprise)
- Supports PostgreSQL, MySQL, Snowflake, BigQuery, and Redshift
- 1,847 active enterprise customers as of Q4 2025
2. DataPipe ETL
- Automated data pipeline builder with 200+ pre-built connectors
- Pricing: starts at $500/month for up to 10 million records/day
- Launched in September 2024
3. PredictIQ
- ML-powered forecasting add-on for InsightBoard Pro
- Pricing: $29/user/month (requires InsightBoard Pro subscription)
- Uses proprietary time-series model trained on retail and logistics data
- Beta launched March 2025, GA release planned for June 2026
All products include 24/7 email support. Enterprise plans include dedicated
account manager and 99.9% SLA.""",
"q4_2025_financials.txt": """ACME ANALYTICS — Q4 2025 FINANCIAL RESULTS (CONFIDENTIAL)
Period: October 1 – December 31, 2025
Revenue: $12.8 million (Q4) / $42.3 million (FY 2025)
Gross Margin: 78.2%
Operating Profit: $1.9 million (Q4) / $5.1 million (FY 2025)
Net Income: $1.4 million (Q4)
Cash on Hand: $14.7 million
Burn Rate: Company is cash-flow positive since Q2 2025
Key Metrics:
- ARR (Annual Recurring Revenue): $48.6 million (up 34% YoY)
- Net Revenue Retention: 118%
- Customer Acquisition Cost (CAC): $8,200
- Customer Lifetime Value (LTV): $67,400
- LTV/CAC Ratio: 8.2x
Headcount grew from 241 to 287 employees during 2025.
R&D spending: 31% of revenue. Sales & Marketing: 28% of revenue.""",
"employee_handbook_excerpt.txt": """ACME ANALYTICS — EMPLOYEE HANDBOOK (EXCERPT)
PAID TIME OFF (PTO):
- All full-time employees receive 22 days of PTO per year (accrued monthly).
- PTO increases to 27 days after 3 years of service.
- Unused PTO can be carried over (max 5 days) or paid out at year-end.
- Sick leave: 10 days per year (separate from PTO).
REMOTE WORK POLICY:
- Engineering and Data Science teams: fully remote eligible.
- Sales and Customer Success: hybrid (minimum 2 days/week in office).
- All employees may work remotely up to 4 weeks/year from any US location.
- International remote work requires VP approval and tax review.
PROFESSIONAL DEVELOPMENT:
- Annual learning budget: $2,500 per employee.
- Conference attendance: up to 2 conferences per year with manager approval.
- Tuition reimbursement: up to $5,250/year for degree programs.""",
"customer_case_study.txt": """CUSTOMER CASE STUDY: BIGRETAIL CORP
Company: BigRetail Corp (1,200 retail stores across 38 states)
Challenge: Siloed data across POS, inventory, and CRM systems made it
impossible for regional managers to get timely insights.
Solution: Deployed InsightBoard Pro Enterprise + DataPipe ETL
- Connected 14 data sources in 3 weeks using DataPipe
- 340 regional managers now use InsightBoard daily
- Natural language queries replaced manual SQL report requests
Results (after 6 months):
- 62% reduction in time-to-insight (from 4 days to 1.5 days average)
- $3.2 million saved in inventory carrying costs
- 23% increase in regional manager satisfaction scores
- SQL report request backlog eliminated entirely
Quote from BigRetail CIO Janet Torres:
\"InsightBoard Pro transformed how our managers interact with data.
They went from waiting days for a report to asking questions in plain
English and getting answers in seconds.\"""",
}
# Write documents to disk
DOC_DIR = Path("acme_docs")
DOC_DIR.mkdir(exist_ok=True)
for filename, content in DOCUMENTS.items():
(DOC_DIR / filename).write_text(content, encoding="utf-8")
print(f"Created {len(DOCUMENTS)} documents in '{DOC_DIR}/':")
for f in sorted(DOC_DIR.iterdir()):
print(f" {f.name} ({f.stat().st_size} bytes)")Created 5 documents in 'acme_docs/':
company_overview.txt (715 bytes)
customer_case_study.txt (924 bytes)
employee_handbook_excerpt.txt (828 bytes)
products_and_services.txt (915 bytes)
q4_2025_financials.txt (709 bytes)
# --- Company 2: Globex Cybersecurity Corp. → globex_docs/ ---
GLOBEX_DOCUMENTS = {
"globex_overview.txt": """GLOBEX CYBERSECURITY CORP. — COMPANY OVERVIEW
Globex Cybersecurity Corp. was founded in 2017 by former NSA analyst
James Whitfield and AI researcher Dr. Priya Narayanan in Boston, Massachusetts.
The company provides AI-driven cybersecurity solutions for financial services
and healthcare organizations.
Headquarters: 88 Federal Street, Suite 1200, Boston, MA 02110
Employees: 412 (as of January 2026)
Annual Revenue (2025): $78.5 million
Funding: Series C ($45M raised in June 2024 from CyberVentures Capital
and Goldman Sachs Growth Equity)
Mission: "To make enterprise-grade cybersecurity autonomous, adaptive,
and accessible to every regulated industry."
The company operates four offices: Boston (HQ), Washington D.C. (government
relations), Tel Aviv (R&D), and Singapore (APAC sales).
CEO: James Whitfield. CTO: Dr. Priya Narayanan. CISO: Robert Tanaka.""",
"globex_products.txt": """GLOBEX CYBERSECURITY — PRODUCTS AND SERVICES
1. ShieldAI Platform (Flagship Product)
- AI-powered threat detection and automated incident response
- Uses behavioral analysis to detect zero-day attacks in real time
- Pricing: $120/endpoint/year (Standard), $195/endpoint/year (Premium)
- 623 enterprise customers across 18 countries as of Q4 2025
- Average detection time: 2.3 minutes (industry average: 197 days)
2. VaultGuard
- Data encryption and access management platform for regulated industries
- HIPAA, SOX, and PCI-DSS compliant out of the box
- Pricing: $35,000/year (up to 500 users), $60,000/year (unlimited users)
- Launched January 2023
3. ThreatIntel Live
- Real-time threat intelligence feed with dark web monitoring
- Pricing: $8,500/month (includes API access and analyst dashboard)
- Aggregates data from 2,400+ sources updated every 15 minutes
- Integrates with Splunk, CrowdStrike, Palo Alto, and Microsoft Sentinel
All products include a dedicated Security Operations Center (SOC) team for
Premium customers. Globex holds ISO 27001 and SOC 2 Type II certifications.""",
"globex_incident_report.txt": """GLOBEX CYBERSECURITY — 2025 THREAT LANDSCAPE REPORT (PUBLIC)
Published: February 2026
KEY FINDINGS FROM GLOBEX'S CUSTOMER BASE (2025):
Total incidents analyzed: 14,237 across 623 customers
- Ransomware attacks: up 42% YoY (3,891 incidents)
- Phishing campaigns: up 28% YoY (5,102 incidents)
- Supply chain attacks: up 67% YoY (1,245 incidents — fastest-growing category)
- Insider threats: relatively stable at 1,834 incidents
ShieldAI Platform Performance:
- Blocked 97.3% of attacks before any data exfiltration
- Mean time to detect (MTTD): 2.3 minutes
- Mean time to respond (MTTR): 8.7 minutes (automated response)
- False positive rate: 0.4% (down from 1.2% in 2024)
NOTABLE CASE: MERIDIAN HEALTH SYSTEM
- 340-bed hospital network targeted by Medusa ransomware variant
- ShieldAI detected anomalous lateral movement within 47 seconds
- Automated containment isolated 3 affected servers in under 2 minutes
- Zero patient data exfiltrated; zero downtime for clinical systems
- Estimated savings: $12M+ (average healthcare ransomware cost per Ponemon)
INDUSTRY PREDICTIONS FOR 2026:
- AI-generated phishing emails will become indistinguishable from human-written
- Quantum-resistant encryption adoption will begin in financial services
- Average cost of a data breach expected to exceed $5.2 million globally""",
}
GLOBEX_DIR = Path("globex_docs")
GLOBEX_DIR.mkdir(exist_ok=True)
for filename, content in GLOBEX_DOCUMENTS.items():
(GLOBEX_DIR / filename).write_text(content, encoding="utf-8")
print(f"Created {len(GLOBEX_DOCUMENTS)} documents in '{GLOBEX_DIR}/':")
for f in sorted(GLOBEX_DIR.iterdir()):
print(f" {f.name} ({f.stat().st_size} bytes)")Created 3 documents in 'globex_docs/':
globex_incident_report.txt (1321 bytes)
globex_overview.txt (864 bytes)
globex_products.txt (1131 bytes)
# --- Company 3: Nextera Green Solutions → nextera_docs/ ---
NEXTERA_DOCUMENTS = {
"nextera_overview.txt": """NEXTERA GREEN SOLUTIONS — COMPANY OVERVIEW
Nextera Green Solutions was founded in 2020 by climate scientist Dr. Elena
Vasquez and former McKinsey partner David Okafor in Denver, Colorado.
The company provides sustainability consulting and carbon accounting software
to help companies meet ESG (Environmental, Social, Governance) targets.
Headquarters: 1700 Lincoln Street, Suite 3400, Denver, CO 80203
Employees: 156 (as of January 2026)
Annual Revenue (2025): $23.8 million
Funding: Series A ($12M raised in November 2023 from Climate Capital Partners
and Patagonia Ventures)
Mission: "To give every company a clear, data-driven path to net zero —
turning climate commitments into measurable action."
The company operates two offices: Denver (HQ) and London (European operations).
CEO: Dr. Elena Vasquez. COO: David Okafor. VP Engineering: Kenji Tanaka.
Nextera is a certified B Corporation and a member of the UN Global Compact.""",
"nextera_services.txt": """NEXTERA GREEN SOLUTIONS — SERVICES AND PLATFORM
1. CarbonLens Platform (Flagship Product)
- Automated Scope 1, 2, and 3 carbon emissions tracking
- Integrates with ERP systems (SAP, Oracle, NetSuite) to pull real data
- AI-powered recommendations for emission reduction strategies
- Pricing: $2,500/month (Standard, up to $500M revenue companies)
$6,500/month (Enterprise, unlimited + custom reporting)
- 312 active customers as of Q4 2025
2. ESG Reporting Suite
- Automated generation of sustainability reports (GRI, SASB, TCFD, CDP)
- Audit-ready documentation with full data lineage
- Pricing: $18,000/year (bundled with CarbonLens Enterprise)
- Standalone: $24,000/year
- Launched April 2024
3. Sustainability Strategy Consulting
- Science-Based Targets initiative (SBTi) alignment services
- Supply chain decarbonization roadmaps
- Pricing: Project-based ($75,000 – $250,000 per engagement)
- Team of 28 consultants with backgrounds in climate science and industry
4. Carbon Offset Marketplace
- Verified carbon credits from 45 projects across 12 countries
- Blockchain-verified chain of custody for each credit
- Transaction fee: 4.5% per credit purchase
- Beta launched August 2025
All platform customers receive quarterly sustainability benchmarking reports
comparing their performance against industry peers.""",
"nextera_impact_report.txt": """NEXTERA GREEN SOLUTIONS — 2025 IMPACT REPORT
Published: January 2026
COMPANY IMPACT METRICS:
- Total customer carbon emissions tracked: 47.3 million metric tons CO2e
- Verified emission reductions enabled: 6.8 million metric tons CO2e (14.4%)
- Customers who improved their CDP score after using Nextera: 89%
- Average time to produce ESG report reduced from 14 weeks to 3.5 weeks
NOTABLE CLIENT RESULTS:
1. SummitBrew Coffee (specialty coffee chain, 380 locations)
- Reduced Scope 3 supply chain emissions by 31% in 18 months
- Switched 62% of logistics to electric vehicles using Nextera roadmap
- First coffee chain to achieve SBTi-validated near-term target
2. Pacific Northwest Lumber Co. (timber and building materials)
- Used CarbonLens to discover 40% of emissions came from transportation
- Optimized shipping routes: saved 8,200 metric tons CO2e and $1.4M/year
- Achieved carbon-neutral certification for 2025 operations
3. Meridian Financial Group (regional bank, $8B assets)
- Implemented financed emissions tracking (PCAF methodology)
- Identified $2.1B in loan portfolio exposed to high transition risk
- Developed green lending program: $340M in sustainability-linked loans
AWARDS AND RECOGNITION (2025):
- Fast Company "Most Innovative Companies" — Energy category (#7)
- B Corp "Best for the World" — Environment category
- Colorado Governor's Award for Clean Energy Innovation""",
}
NEXTERA_DIR = Path("nextera_docs")
NEXTERA_DIR.mkdir(exist_ok=True)
for filename, content in NEXTERA_DOCUMENTS.items():
(NEXTERA_DIR / filename).write_text(content, encoding="utf-8")
print(f"Created {len(NEXTERA_DOCUMENTS)} documents in '{NEXTERA_DIR}/':")
for f in sorted(NEXTERA_DIR.iterdir()):
print(f" {f.name} ({f.stat().st_size} bytes)")Created 3 documents in 'nextera_docs/':
nextera_impact_report.txt (1434 bytes)
nextera_overview.txt (938 bytes)
nextera_services.txt (1394 bytes)
4) Ingest Documents Using LlamaIndex Data Loaders (via LlamaHub)¶
A key feature of LlamaIndex is its Data Loaders available via LlamaHub. Data Loaders can pull information from hundreds of sources -- Google Docs, Notion, GitHub, Slack, databases, PDFs, or simple .txt files. Think of LlamaHub as a plugin ecosystem: instead of writing custom ingestion code for each data source, you install a loader and point it at your data.
Here we use SimpleDirectoryReader (the built-in Data Loader for local files) to load documents from three separate company directories into a single searchable index. Under the hood, SimpleDirectoryReader reads each file, wraps it in a LlamaIndex Document object with metadata (file name, path), and passes it to VectorStoreIndex. The index then automatically chunks each document, computes embeddings using our configured HuggingFace model, and stores everything in an in-memory vector store. All of that -- chunking, embedding, indexing -- happens in a single VectorStoreIndex.from_documents() call.
Want to try the GitHub Data Loader? LlamaHub also offers GithubRepositoryReader, which reads files directly from a GitHub repository -- no git clone needed. See the commented-out code below to try it yourself. You just need a free GitHub token (any GitHub account, no special permissions needed for public repos): https://
| Loader | Source | Package | Used here? |
|---|---|---|---|
SimpleDirectoryReader | Local .txt files | llama-index (built-in) | Yes (default) |
GithubRepositoryReader | GitHub repo | llama-index-readers-github (LlamaHub) | Commented-out example |
GoogleDriveReader | Google Drive | llama-index-readers-google | Available via LlamaHub |
NotionPageReader | Notion pages | llama-index-readers-notion | Available via LlamaHub |
ConfluenceReader | Atlassian Confluence | llama-index-readers-confluence | Available via LlamaHub |
S3Reader | AWS S3 buckets | llama-index-readers-s3 | Available via LlamaHub |
| ...and 300+ more |
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
# ===========================================================================
# DEFAULT: Load documents from local directories using SimpleDirectoryReader
# ===========================================================================
LOCAL_DIRS = {
"acme_docs": "Acme Analytics (BI tools)",
"globex_docs": "Globex Cybersecurity (security)",
"nextera_docs": "Nextera Green Solutions (sustainability)",
}
all_documents = []
for dir_name, label in LOCAL_DIRS.items():
loader = SimpleDirectoryReader(dir_name)
docs = loader.load_data()
print(f" Loaded: '{dir_name}/' → {len(docs)} docs [{label}]")
all_documents.extend(docs)
print(f"Total documents loaded from {len(LOCAL_DIRS)} directories: {len(all_documents)}")
# ===========================================================================
# OPTIONAL: Try the GitHub Data Loader instead (LlamaHub)
#
# This reads the SAME documents directly from a GitHub repo — no git clone
# needed. To try it:
# 1. Generate a free GitHub token: https://github.com/settings/tokens?type=beta
# (any GitHub account works; no special permissions needed for public repos)
# 2. Add it to your .env file: GITHUB_TOKEN=github_pat_xxxx
# Or in Colab: add a Secret named GITHUB_TOKEN
# 3. Uncomment the code below and comment out the SimpleDirectoryReader above
# ===========================================================================
# from llama_index.readers.github import GithubRepositoryReader, GithubClient
#
# github_client = GithubClient(github_token=_get_key("GITHUB_TOKEN"), verbose=False)
#
# GITHUB_DIRS = {
# "acme_docs": "Acme Analytics (BI tools)",
# "globex_docs": "Globex Cybersecurity (security)",
# "nextera_docs": "Nextera Green Solutions (sustainability)",
# }
#
# all_documents = []
# for dir_path, label in GITHUB_DIRS.items():
# reader = GithubRepositoryReader(
# github_client=github_client,
# owner="KarAnalytics", # GitHub username
# repo="llamaindexdata", # repo name
# filter_directories=([dir_path], GithubRepositoryReader.FilterType.INCLUDE),
# verbose=False,
# )
# docs = reader.load_data(branch="main")
# print(f" GitHub Loader: '{dir_path}/' → {len(docs)} docs [{label}]")
# all_documents.extend(docs)
# ---------------------------------------------------------------------------
# Build a SINGLE vector index across all three companies
# ---------------------------------------------------------------------------
index = VectorStoreIndex.from_documents(all_documents, show_progress=True)
print(f"Vector index built with {len(all_documents)} chunks")
# Create a query engine
query_engine = index.as_query_engine(similarity_top_k=3)
print("Query engine ready — ask questions about ANY of the three companies!")
WARNING:llama_index.core.readers.file.base:`llama-index-readers-file` package not found, some file readers will not be available if not provided by the `file_extractor` parameter.
WARNING:llama_index.core.readers.file.base:`llama-index-readers-file` package not found, some file readers will not be available if not provided by the `file_extractor` parameter.
WARNING:llama_index.core.readers.file.base:`llama-index-readers-file` package not found, some file readers will not be available if not provided by the `file_extractor` parameter.
WARNING:llama_index.core.readers.file.base:`llama-index-readers-file` package not found, some file readers will not be available if not provided by the `file_extractor` parameter.
WARNING:llama_index.core.readers.file.base:`llama-index-readers-file` package not found, some file readers will not be available if not provided by the `file_extractor` parameter.
WARNING:llama_index.core.readers.file.base:`llama-index-readers-file` package not found, some file readers will not be available if not provided by the `file_extractor` parameter.
WARNING:llama_index.core.readers.file.base:`llama-index-readers-file` package not found, some file readers will not be available if not provided by the `file_extractor` parameter.
WARNING:llama_index.core.readers.file.base:`llama-index-readers-file` package not found, some file readers will not be available if not provided by the `file_extractor` parameter.
WARNING:llama_index.core.readers.file.base:`llama-index-readers-file` package not found, some file readers will not be available if not provided by the `file_extractor` parameter.
WARNING:llama_index.core.readers.file.base:`llama-index-readers-file` package not found, some file readers will not be available if not provided by the `file_extractor` parameter.
WARNING:llama_index.core.readers.file.base:`llama-index-readers-file` package not found, some file readers will not be available if not provided by the `file_extractor` parameter.
Loaded: 'acme_docs/' → 5 docs [Acme Analytics (BI tools)]
Loaded: 'globex_docs/' → 3 docs [Globex Cybersecurity (security)]
Loaded: 'nextera_docs/' → 3 docs [Nextera Green Solutions (sustainability)]
Total documents loaded from 3 directories: 11
Vector index built with 11 chunks
Query engine ready — ask questions about ANY of the three companies!
Notice how few lines of code that took. Two calls -- SimpleDirectoryReader to load, VectorStoreIndex.from_documents() to index -- replaced all the manual chunking, embedding, and vector store insertion we did in earlier notebooks. This is the core value proposition of LlamaIndex: it abstracts the RAG plumbing so you can focus on choosing the right data sources and tuning retrieval quality.
5) What Just Happened? Peeking Under the Hood¶
The two lines of code above -- SimpleDirectoryReader and VectorStoreIndex.from_documents() -- did a remarkable amount of work. They loaded 11 text files, split them into chunks, computed a 384-dimensional embedding for each chunk using our local HuggingFace model, and organized those embeddings into a searchable index. If you built RAG manually in the earlier notebooks, you know that each of those steps required explicit code: reading files, calling a text splitter, batching through an embedding API, and inserting into a vector store.
Let us inspect what LlamaIndex created so we can verify that the abstraction is doing sensible things. The cell below runs a test retrieval query and shows which chunks came back, including their similarity scores and source files. This transparency is important: frameworks are powerful, but you should always be able to look inside and confirm that the right documents are being retrieved for the right reasons.
# Inspect the chunks LlamaIndex created
retriever = index.as_retriever(similarity_top_k=3)
# Test retrieval without LLM — just see what chunks come back
test_query = "How much revenue did Acme make in Q4 2025?"
retrieved_nodes = retriever.retrieve(test_query)
print(f"Query: '{test_query}'")
print(f"Retrieved {len(retrieved_nodes)} chunks:\n")
for i, node in enumerate(retrieved_nodes, 1):
source = node.metadata.get('file_name', 'unknown')
score = node.score
text_preview = node.text[:200].replace('\n', ' ')
print(f" Chunk #{i} (score={score:.4f}, source={source})")
print(f" {text_preview}...\n")Query: 'How much revenue did Acme make in Q4 2025?'
Retrieved 3 chunks:
Chunk #1 (score=0.6737, source=q4_2025_financials.txt)
ACME ANALYTICS — Q4 2025 FINANCIAL RESULTS (CONFIDENTIAL) Period: October 1 – December 31, 2025 Revenue: $12.8 million (Q4) / $42.3 million (FY 2025) Gross Margin: 78.2% Operating Profi...
Chunk #2 (score=0.4163, source=company_overview.txt)
ACME ANALYTICS INC. — COMPANY OVERVIEW Acme Analytics Inc. was founded in 2019 by Dr. Sarah Chen and Marcus Rivera in Lawrence, Kansas. The company specializes in AI-powered business intelligence too...
Chunk #3 (score=0.3867, source=employee_handbook_excerpt.txt)
ACME ANALYTICS — EMPLOYEE HANDBOOK (EXCERPT) PAID TIME OFF (PTO): - All full-time employees receive 22 days of PTO per year (accrued monthly). - PTO increases to 27 days after 3 years of service. - U...
6) Query with RAG vs. Without RAG¶
Now we compare:
With RAG (LlamaIndex): The query engine retrieves relevant chunks, sends them + the question to the LLM, and returns a grounded answer.
Without RAG (Direct LLM): We ask the same question directly to the LLM with no context. Since the data is fictional, the LLM either refuses or hallucinates.
For the “without RAG” call we use generate_text() from our 7-vendor cascade.
from llm_cascade import get_cascade
_llm_raw = get_cascade(verbose=False)
def generate_text_no_rag(prompt, system_prompt=None):
"""Direct LLM call via llm_cascade (no RAG context)."""
response = _llm_raw.generate(prompt, system_prompt=system_prompt)
return response.text, response.provider
print("Direct LLM function ready (for without-RAG comparison).")
Direct LLM function ready (for without-RAG comparison).
7) Run End-to-End Examples: With RAG vs. Without RAG¶
We test questions about three fictional companies that no LLM has seen in training. Notice how LlamaIndex routes each question to the correct company’s documents — even though all three are in the same index. The source attribution shows which file answered each question.
With RAG: correct, specific answers grounded on the right company’s documents
Without RAG: refusals, hedging, or hallucinated numbers
questions = [
# Acme Analytics questions
"What was Acme Analytics' revenue in Q4 2025?",
#"How many days of PTO do new employees get at Acme?",
# Globex Cybersecurity questions
#"What is Globex Cybersecurity's flagship product and how fast does it detect threats?",
#"How did Globex help Meridian Health System during a ransomware attack?",
# Nextera Green Solutions questions
#"Who founded Nextera Green Solutions and where is it headquartered?",
#"How much carbon emissions reduction did Nextera enable for its customers in 2025?",
]
def preview(text, max_len=800):
text = str(text) if text else ""
return text[:max_len] + ("..." if len(text) > max_len else "")
if not has_llm_provider():
print("Error: No LLM API key configured. Set at least one API key in Colab Secrets.")
else:
for i, q in enumerate(questions, start=1):
print("=" * 80)
print(f"Q{i}. {q}")
print("=" * 80)
# --- WITH RAG (LlamaIndex) ---
print("--- WITH RAG (LlamaIndex) ---")
try:
response = query_engine.query(q)
print(" Answer:")
print(preview(response.response))
sources = set()
for node in response.source_nodes:
sources.add(node.metadata.get("file_name", "unknown"))
print(f" Sources: {', '.join(sources)}")
except Exception as e:
print(f" RAG error: {e}")
# --- WITHOUT RAG ---
print("--- WITHOUT RAG (LLM knowledge only) ---")
try:
answer_direct, prov_direct = generate_text_no_rag(q)
print(f" Answer (provider: {prov_direct}):")
print(preview(answer_direct))
except Exception as e:
print(f" Direct error: {e}")
print()
================================================================================
Q1. What was Acme Analytics' revenue in Q4 2025?
================================================================================
--- WITH RAG (LlamaIndex) ---
Answer:
Acme Analytics' revenue in Q4 2025 was $12.8 million.
Sources: q4_2025_financials.txt, products_and_services.txt, company_overview.txt
--- WITHOUT RAG (LLM knowledge only) ---
Answer (provider: Gemini):
I don't have enough information to answer that question. Here's why:
* **Real-time financial data is private:** Companies' specific revenue figures, especially for future quarters like Q4 2025, are proprietary and not publicly available in a general knowledge base like mine.
* **Future events are speculative:** Even if Acme Analytics were a real, publicly traded company, Q4 2025 is in the future. Any revenue projections would be estimates, not definitive figures, and only the company itself would be making those.
**To find revenue information for a real company, you would typically look for:**
* **Official company earnings reports:** These are usually released quarterly and annually.
* **Financial news websites:** Reputable financial news outlets often report on company earnings.
* **In...
Checkpoint: Reflection Questions¶
Hallucination check: Did the without-RAG answers invent specific revenue numbers or employee counts for a fictional company?
Source attribution: LlamaIndex tells us which document the answer came from. Why is this important for trust?
Chunking: How does the chunk size affect retrieval quality? What if a key fact spans two chunks?
Framework vs. manual: Compare this LlamaIndex approach to the manual RAG we built in the VectorDB and DBMS_RAG notebooks. What did the framework handle for us?
8) Interactive Query (Optional)¶
Now it is your turn to explore. Change the question in the cell below to anything you want to ask about the three fictional companies. Try questions that span multiple documents (e.g., “Compare Acme’s revenue to Globex’s products”) or questions about specific details buried deep in a single document. Pay attention to the source attributions -- they tell you which file LlamaIndex retrieved each answer from, which is a useful trust signal in any RAG system.
# Change this to any question about Acme Analytics
my_question = "What is Acme's LTV/CAC ratio and what does it mean?"
if has_llm_provider():
print(f"Question: {my_question}\n")
print("=" * 60)
print("WITH RAG (LlamaIndex):")
print("=" * 60)
try:
response = query_engine.query(my_question)
print(f"\n{response.response}")
sources = set()
for node in response.source_nodes:
sources.add(node.metadata.get('file_name', 'unknown'))
print(f"\nSources: {', '.join(sources)}")
except Exception as e:
print(f"Error: {e}")
print("\n" + "=" * 60)
print("WITHOUT RAG:")
print("=" * 60)
try:
answer_direct, prov = generate_text_no_rag(my_question)
print(f"\n[{prov}] {answer_direct}")
except Exception as e:
print(f"Error: {e}")
else:
print("Set at least one API key first.")Question: What is Acme's LTV/CAC ratio and what does it mean?
============================================================
WITH RAG (LlamaIndex):
============================================================
Based on the financial results provided, Acme's **LTV/CAC ratio is 8.2x** (or 8.2).
**What this means:**
- **LTV (Lifetime Value)** of $67,400 represents the total revenue Acme expects to generate from a single customer over the entire duration of their relationship.
- **CAC (Customer Acquisition Cost)** of $8,200 represents the sales and marketing spend required to acquire one new customer.
An 8.2x ratio indicates that for every **$1** Acme spends to acquire a customer, the company expects to earn **$8.20** back from that customer over time. This suggests highly efficient customer acquisition economics, where the long-term value generated significantly exceeds the upfront investment required to win the business.
Sources: q4_2025_financials.txt, company_overview.txt, employee_handbook_excerpt.txt
============================================================
WITHOUT RAG:
============================================================
[Gemini] Let's break down Acme's LTV/CAC ratio, what it means, and why it's so important.
First, a quick refresher on the two components:
* **LTV (Lifetime Value):** The total revenue a business can expect to generate from a single customer account over the entire period of their relationship.
* **CAC (Customer Acquisition Cost):** The total cost associated with acquiring a new customer. This includes marketing spend, sales salaries, tools, etc., divided by the number of new customers acquired over a given period.
---
**What is Acme's LTV/CAC ratio?**
To answer this, we need the *actual numbers* for Acme's LTV and CAC. Since I don't have access to Acme's internal financial data, I can't give you a specific numerical ratio.
**However, I can tell you how to calculate it and give you examples of what it *might* be.**
**Formula:**
$$\text{LTV/CAC Ratio} = \frac{\text{Lifetime Value (LTV)}}{\text{Customer Acquisition Cost (CAC)}}$$
**Example Scenarios for Acme:**
Let's imagine some possibilities for Acme:
* **Scenario 1: Acme has an LTV of \$3,000 and a CAC of \$1,000.**
* LTV/CAC Ratio = \$3,000 / \$1,000 = **3:1** (or just 3)
* **Scenario 2: Acme has an LTV of \$500 and a CAC of \$250.**
* LTV/CAC Ratio = \$500 / \$250 = **2:1** (or just 2)
* **Scenario 3: Acme has an LTV of \$1,500 and a CAC of \$2,000.**
* LTV/CAC Ratio = \$1,500 / \$2,000 = **0.75:1** (or just 0.75)
---
**What Does Acme's LTV/CAC Ratio Mean?**
The LTV/CAC ratio is a crucial metric that tells Acme (and its investors) **how much value it gets back for every dollar it spends to acquire a new customer.**
Here's a breakdown of what different ratios generally signify:
1. **LTV/CAC < 1:1 (e.g., 0.75:1)**
* **Meaning for Acme:** Acme is spending more money to acquire a customer than that customer will ever generate in revenue.
* **Implication:** This is a highly problematic and unsustainable business model. Acme is losing money on every customer acquired. It needs to drastically reduce its CAC, increase its LTV, or both, to survive.
2. **LTV/CAC ≈ 1:1 (e.g., 1.1:1)**
* **Meaning for Acme:** Acme is just barely breaking even on its customer acquisition efforts.
* **Implication:** While not losing money, Acme isn't generating much profit from new customers. There's little room for error, and the business likely won't scale profitably. It suggests a need for optimization in either LTV or CAC.
3. **LTV/CAC between 2:1 and 3:1 (e.g., 2.5:1)**
* **Meaning for Acme:** Acme is generating a reasonable return on its acquisition spend. For every dollar spent, it gets back \$2 to \$3 over the customer's lifetime.
* **Implication:** This is generally considered a healthy and sustainable ratio for many businesses, especially for newer companies or those in competitive markets. It indicates a solid foundation for growth.
4. **LTV/CAC > 3:1 (e.g., 4:1, 5:1, or even higher)**
* **Meaning for Acme:** Acme is highly efficient in its customer acquisition, generating significant returns for every dollar invested. For every dollar spent, it gets back \$3 or more.
* **Implication:** This is often considered an excellent ratio. It means Acme has a very profitable customer acquisition machine. It might even suggest that Acme could *afford to spend more* on marketing and sales to accelerate growth, as long as the ratio remains healthy. Sometimes, an extremely high ratio can mean Acme isn't investing enough in growth and could be leaving market share on the table.
**In Summary, for Acme:**
The LTV/CAC ratio is a **key indicator of business health and scalability.**
* It helps Acme understand the **profitability of its customer acquisition channels.**
* It guides decisions on **marketing spend, sales strategy, product development, and customer retention efforts.**
* **A higher ratio is generally better**, but an *extremely* high ratio might mean Acme is under-investing in growth.
* **The ideal ratio can vary by industry** (e.g., SaaS often targets 3:1 or higher due to high margins and recurring revenue, while e-commerce might be happy with 2:1 for repeat purchases).
Without Acme's specific numbers, we can only discuss the implications of the ratio, but knowing *how* to interpret it is crucial for any business.
9) Bonus: Customizing the RAG Pipeline¶
LlamaIndex is highly configurable, and understanding the knobs you can turn is essential for moving from a demo to a production system. Here we show how to tweak three important parameters:
Chunk size -- smaller chunks yield more precise retrieval (the system finds the exact paragraph), while larger chunks give the LLM more surrounding context per chunk. The tradeoff is granularity vs. context.
Top-k -- how many chunks to retrieve. More chunks provide broader coverage but increase the prompt length (and cost). Too few chunks risk missing relevant information.
System prompt -- customize the LLM’s behavior and persona for your specific use case.
The cell below rebuilds the index with a different chunk size and top-k, then runs the same question to show how these parameters affect the answer.
from llama_index.core.node_parser import SentenceSplitter
# Re-index with smaller chunks and more overlap
splitter = SentenceSplitter(chunk_size=256, chunk_overlap=50)
small_chunk_index = VectorStoreIndex.from_documents(
all_documents,
transformations=[splitter],
show_progress=True,
)
# Query with more chunks retrieved
custom_engine = small_chunk_index.as_query_engine(similarity_top_k=5)
test_q = "What is the pricing for InsightBoard Pro?"
print(f"Question: {test_q}\n")
if has_llm_provider():
try:
response = custom_engine.query(test_q)
print(f"Answer (small chunks, top_k=5):")
print(response.response)
print(f"\nChunks used: {len(response.source_nodes)}")
for i, node in enumerate(response.source_nodes, 1):
print(f" #{i} score={node.score:.4f} | {node.metadata.get('file_name', '?')}")
print(f" {node.text[:100].replace(chr(10), ' ')}...")
except Exception as e:
print(f"Error: {e}")Question: What is the pricing for InsightBoard Pro?
Answer (small chunks, top_k=5):
InsightBoard Pro has two pricing tiers:
* **Standard:** $49/user/month
* **Enterprise:** $89/user/month
Chunks used: 5
#1 score=0.7414 | products_and_services.txt
ACME ANALYTICS — PRODUCTS AND SERVICES 1. InsightBoard Pro (Flagship Product) - Real-time dashbo...
#2 score=0.4986 | globex_products.txt
GLOBEX CYBERSECURITY — PRODUCTS AND SERVICES 1. ShieldAI Platform (Flagship Product) - AI-powere...
#3 score=0.3897 | globex_products.txt
ThreatIntel Live - Real-time threat intelligence feed with dark web monitoring - Pricing: $8,5...
#4 score=0.3663 | nextera_services.txt
NEXTERA GREEN SOLUTIONS — SERVICES AND PLATFORM 1. CarbonLens Platform (Flagship Product) - Auto...
#5 score=0.3621 | customer_case_study.txt
CUSTOMER CASE STUDY: BIGRETAIL CORP Company: BigRetail Corp (1,200 retail stores across 38 states) ...
10) Teaching Notes and Exercises¶
Key takeaways:
LlamaIndex abstracts the RAG pipeline: document loading, chunking, embedding, indexing, retrieval, and LLM prompting — all in a few lines of code.
Under the hood, it does the same thing as our manual RAG notebooks (embed → retrieve → generate), but with less boilerplate.
Source attribution (which document chunk answered the question) comes for free.
The framework is highly configurable: chunk size, overlap, top-k, embedding model, LLM, and prompt templates can all be swapped.
Using synthetic/private data makes the with-vs-without-RAG comparison very clear — the LLM literally cannot answer without retrieval.
LlamaIndex vs. Manual RAG — When to use which:
| Aspect | Manual RAG | LlamaIndex |
|---|---|---|
| Learning | Better for understanding fundamentals | Hides implementation details |
| Speed to prototype | Slower | Much faster |
| Customization | Full control | Configurable, but within framework |
| Production readiness | Build it yourself | Batteries included |
Exercises:
Add more documents (e.g., a product roadmap, a competitor analysis) and see how retrieval adapts.
Experiment with different chunk sizes (128, 512, 1024) and compare answer quality.
Try using a different embedding model (e.g.,
BAAI/bge-small-en-v1.5) and compare retrieval accuracy.Add a metadata filter so queries only search specific document types (e.g., only financials).
Key takeaways¶
LlamaIndex collapses the manual RAG pipeline (load, chunk, embed, index, retrieve, prompt) into two or three lines of code.
Data Loaders from LlamaHub let you ingest from local files, GitHub, Google Drive, Notion, S3, and 300+ other sources with a uniform API.
Source attribution comes for free -- each answer carries the file names of the chunks that produced it, which is critical for trust.
Configuration knobs like
chunk_size,chunk_overlap, andsimilarity_top_klet you tune the precision-vs-context tradeoff without rewriting plumbing.Synthetic private documents make the with-vs-without-RAG contrast vivid: the LLM cannot possibly know these companies from training.
Run the code¶
To run this notebook, copy the URL below into your browser’s address bar. The link opens the notebook directly in Google Colab. (If your PDF viewer makes the URL clickable and lands on a broken page, copy the full text manually -- the viewer may have truncated the link at a line break.)
https://colab.research.google.com/github/KarAnalytics/code_demos/blob/main/LlamaIndex_RAG.ipynb