LangChain RAG Demo: Document Q&A with a Framework

This notebook demonstrates how LangChain simplifies building RAG pipelines. LangChain is one of the most popular frameworks for building LLM-powered applications, providing modular components for document loading, text splitting, embeddings, vector stores, and retrieval chains.

How it works:

Load synthetic company documents from three separate directories (no external data needed).
Use LangChain to split, embed, and store the documents in a FAISS vector store. NOTE that FAISS is not a vector database. It is an open-source library for efficient, high-speed similarity search and clustering of dense vectors. Unlike ChromaDB or Pinecone, which are managed or structured vector databases, FAISS is a low-level, high-performance library that stores vectors primarily in-memory.
Build a retrieval chain that fetches relevant chunks and sends them to an LLM.
Compare answers with RAG (LangChain pipeline) vs. without RAG (direct LLM call).

Learning goals:

Understand LangChain’s modular architecture: Loaders → Splitters → Embeddings → Vector Stores → Chains
See how LangChain compares to LlamaIndex for building RAG pipelines
Observe how RAG grounding eliminates hallucination on private/synthetic data

Provider setup: This notebook uses the llm_cascade package, which auto-detects your API keys and falls back to the next provider if one is unavailable. Supported providers: OpenAI, Gemini, Ollama, Grok (xAI), Groq, HuggingFace, Cohere, OpenRouter.

Store any of these API keys in Colab Secrets (or a local .env file): OPENAI_API_KEY, GEMINI_API_KEY, OLLAMA_API_KEY, XAI_API_KEY, GROQ_API_KEY, HF_TOKEN, COHERE_API_KEY, OPENROUTER_API_KEY

LangChain is the other major framework in this space — if you already worked through Chapter 12’s LlamaIndex notebooks, think of this as the same problem solved by a more explicit, composable toolkit where you wire each step together yourself.

!pip install -q -U langchain langgraph langchain-community langchain-huggingface langchain-text-splitters faiss-cpu google-genai openai langchain-google-genai git+https://github.com/KarAnalytics/llm_cascade.git sentence-transformers

  Preparing metadata (setup.py) ... done
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 52.4/52.4 kB 2.2 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 173.7/173.7 kB 8.6 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.5/2.5 MB 34.7 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.8/23.8 MB 32.6 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 783.6/783.6 kB 30.8 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 39.7 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 67.6/67.6 kB 7.4 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 571.3/571.3 kB 45.8 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 240.6/240.6 kB 25.1 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/1.0 MB 69.2 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 515.1/515.1 kB 35.5 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 64.9/64.9 kB 7.3 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 51.0/51.0 kB 5.4 MB/s eta 0:00:00
  Building wheel for llm_cascade (setup.py) ... done
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires google-auth==2.47.0, but you have google-auth 2.49.2 which is incompatible.
google-colab 1.0.0 requires requests==2.32.4, but you have requests 2.33.1 which is incompatible.

Imports and Provider Helpers (8-Vendor Cascade)¶

We configure the same 8-provider cascade used across all our RAG notebooks. LangChain can work with any OpenAI-compatible API, so our cascade integrates naturally -- we point LangChain’s ChatOpenAI class at whichever provider is available, using its OpenAI-compatible endpoint. This means the same LangChain code works whether you are running against OpenAI, Gemini, Groq, or any other provider in the cascade.

from pathlib import Path
from llm_cascade.providers import PROVIDERS, _load_env, _get_key, _is_retriable_error

_load_env()

def get_available_providers():
    return [p for p in PROVIDERS if _get_key(p['key_env'])]

def has_llm_provider():
    return len(get_available_providers()) > 0

# Print status
available = get_available_providers()
if available:
    print('Providers configured (in fallback order):')
    for p in available:
        print(f"  + {p['name']:<16} model = {p['default_model']}")
else:
    print('WARNING: No API keys found.')

Providers configured (in fallback order):
  + Gemini           model = gemini-2.5-flash
  + Ollama           model = kimi-k2.5:cloud
  + Groq             model = llama-3.3-70b-versatile
  + HuggingFace      model = meta-llama/Llama-3.3-70B-Instruct
  + Cohere           model = command-a-03-2025
  + OpenRouter       model = meta-llama/llama-3.3-70b-instruct:free
  + OpenAI           model = gpt-4o-mini

Configure LangChain LLM and Embeddings¶

LangChain needs two components before we can build a retrieval chain:

An LLM for generating answers (we use ChatOpenAI pointed at the first available provider from our cascade)
An embedding model for converting text to vectors (local HuggingFace model -- no API key needed)

One thing to notice here: LangChain uses ChatOpenAI as its LLM wrapper even when talking to non-OpenAI providers. This works because most LLM providers now expose OpenAI-compatible API endpoints. We simply swap the base_url and api_key to point at Gemini, Groq, or whatever provider is available. The rest of LangChain’s pipeline does not need to know or care which model is behind the API.

from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.chat_models import ChatOpenAI

# ---- Embedding model (local, no API key needed) -----------------------------
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
print("Embedding model: sentence-transformers/all-MiniLM-L6-v2 (local)")

# ---- LLM with runtime fallback (cascades if quota is exhausted) --------------
llm_candidates = []
for provider in get_available_providers():
    api_key = _get_key(provider["key_env"])
    try:
        if provider["style"] == "gemini":
            from langchain_google_genai import ChatGoogleGenerativeAI
            candidate = ChatGoogleGenerativeAI(
                google_api_key=api_key,
                model=provider["default_model"],
            )
        else:
            candidate = ChatOpenAI(
                api_key=api_key,
                base_url=provider["base_url"],
                model=provider["default_model"],
            )
        llm_candidates.append((provider, candidate))
        print(f"  + {provider['name']:<16} ({provider['default_model']})")
    except Exception as e:
        print(f"  - {provider['name']:<16} skipped: {e}")

if llm_candidates:
    primary_provider, primary_llm = llm_candidates[0]
    fallback_llms = [c for _, c in llm_candidates[1:]]

    if fallback_llms:
        llm = primary_llm.with_fallbacks(fallback_llms)
        print(f"\nPrimary LLM: {primary_provider['name']} ({primary_provider['default_model']})")
        print(f"Fallbacks:   {', '.join(p['name'] for p, _ in llm_candidates[1:])}")
        print("(If primary hits quota, automatically tries the next provider)")
    else:
        llm = primary_llm
        print(f"\nLLM provider: {primary_provider['name']} ({primary_provider['default_model']})")
        print("(No fallback providers configured)")

    llm_provider = primary_provider
else:
    llm = None
    primary_llm = None
    llm_provider = None
    print("ERROR: Could not configure any LLM provider. Check your API keys.")

BertModel LOAD REPORT from: sentence-transformers/all-MiniLM-L6-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.

Embedding model: sentence-transformers/all-MiniLM-L6-v2 (local)
  + Gemini           (gemini-2.5-flash)

/tmp/ipykernel_2361/1873978192.py:20: LangChainDeprecationWarning: The class `ChatOpenAI` was deprecated in LangChain 0.0.10 and will be removed in 1.0. An updated version of the class exists in the `langchain-openai package and should be used instead. To use it run `pip install -U `langchain-openai` and import as `from `langchain_openai import ChatOpenAI``.
  candidate = ChatOpenAI(

  + Ollama           (kimi-k2.5:cloud)
  + Groq             (llama-3.3-70b-versatile)
  + HuggingFace      (meta-llama/Llama-3.3-70B-Instruct)
  + Cohere           (command-a-03-2025)
  + OpenRouter       (meta-llama/llama-3.3-70b-instruct:free)
  + OpenAI           (gpt-4o-mini)

Primary LLM: Gemini (gemini-2.5-flash)
Fallbacks:   Ollama, Groq, HuggingFace, Cohere, OpenRouter, OpenAI
(If primary hits quota, automatically tries the next provider)

Create Synthetic Company Documents¶

We generate fictional documents for three companies that the LLM has never seen in training. This ensures that correct answers can only come from RAG retrieval, not from parametric memory. Each company’s documents live in a separate directory, simulating real-world data silos.

Directory	Company	Documents
`acme_docs/`	Acme Analytics Inc. (BI tools, Lawrence KS)	Overview, Products, Financials, Handbook, Case Study
`globex_docs/`	Globex Cybersecurity Corp. (cybersecurity, Boston)	Overview, Products, Incident Report
`nextera_docs/`	Nextera Green Solutions (sustainability consulting, Denver)	Overview, Services, Impact Report

Later, we will use LlamaIndex Data Loaders (SimpleDirectoryReader) to ingest all three directories — demonstrating how LlamaIndex can pull information from multiple, separate data sources and unify them in a single searchable index.

DOCUMENTS = {
    "company_overview.txt": """ACME ANALYTICS INC. — COMPANY OVERVIEW

Acme Analytics Inc. was founded in 2019 by Dr. Sarah Chen and Marcus Rivera
in Lawrence, Kansas. The company specializes in AI-powered business intelligence
tools for mid-market companies (100–5,000 employees).

Headquarters: 1420 Jayhawk Boulevard, Lawrence, KS 66045
Employees: 287 (as of January 2026)
Annual Revenue (2025): $42.3 million
Funding: Series B ($18M raised in March 2023 from Midwest Ventures)

Mission: "To democratize data analytics so every business decision is informed
by evidence, not intuition."

The company operates three offices: Lawrence (HQ), Chicago (sales), and
Austin (engineering). CEO: Dr. Sarah Chen. CTO: Marcus Rivera. CFO: Linda Park.""",

    "products_and_services.txt": """ACME ANALYTICS — PRODUCTS AND SERVICES

1. InsightBoard Pro (Flagship Product)
   - Real-time dashboard platform with natural language query interface
   - Pricing: $49/user/month (Standard), $89/user/month (Enterprise)
   - Supports PostgreSQL, MySQL, Snowflake, BigQuery, and Redshift
   - 1,847 active enterprise customers as of Q4 2025

2. DataPipe ETL
   - Automated data pipeline builder with 200+ pre-built connectors
   - Pricing: starts at $500/month for up to 10 million records/day
   - Launched in September 2024

3. PredictIQ
   - ML-powered forecasting add-on for InsightBoard Pro
   - Pricing: $29/user/month (requires InsightBoard Pro subscription)
   - Uses proprietary time-series model trained on retail and logistics data
   - Beta launched March 2025, GA release planned for June 2026

All products include 24/7 email support. Enterprise plans include dedicated
account manager and 99.9% SLA.""",

    "q4_2025_financials.txt": """ACME ANALYTICS — Q4 2025 FINANCIAL RESULTS (CONFIDENTIAL)

Period: October 1 – December 31, 2025

Revenue:          $12.8 million (Q4) / $42.3 million (FY 2025)
Gross Margin:     78.2%
Operating Profit: $1.9 million (Q4) / $5.1 million (FY 2025)
Net Income:       $1.4 million (Q4)
Cash on Hand:     $14.7 million
Burn Rate:        Company is cash-flow positive since Q2 2025

Key Metrics:
- ARR (Annual Recurring Revenue): $48.6 million (up 34% YoY)
- Net Revenue Retention: 118%
- Customer Acquisition Cost (CAC): $8,200
- Customer Lifetime Value (LTV): $67,400
- LTV/CAC Ratio: 8.2x

Headcount grew from 241 to 287 employees during 2025.
R&D spending: 31% of revenue. Sales & Marketing: 28% of revenue.""",

    "employee_handbook_excerpt.txt": """ACME ANALYTICS — EMPLOYEE HANDBOOK (EXCERPT)

PAID TIME OFF (PTO):
- All full-time employees receive 22 days of PTO per year (accrued monthly).
- PTO increases to 27 days after 3 years of service.
- Unused PTO can be carried over (max 5 days) or paid out at year-end.
- Sick leave: 10 days per year (separate from PTO).

REMOTE WORK POLICY:
- Engineering and Data Science teams: fully remote eligible.
- Sales and Customer Success: hybrid (minimum 2 days/week in office).
- All employees may work remotely up to 4 weeks/year from any US location.
- International remote work requires VP approval and tax review.

PROFESSIONAL DEVELOPMENT:
- Annual learning budget: $2,500 per employee.
- Conference attendance: up to 2 conferences per year with manager approval.
- Tuition reimbursement: up to $5,250/year for degree programs.""",

    "customer_case_study.txt": """CUSTOMER CASE STUDY: BIGRETAIL CORP

Company: BigRetail Corp (1,200 retail stores across 38 states)
Challenge: Siloed data across POS, inventory, and CRM systems made it
impossible for regional managers to get timely insights.

Solution: Deployed InsightBoard Pro Enterprise + DataPipe ETL
- Connected 14 data sources in 3 weeks using DataPipe
- 340 regional managers now use InsightBoard daily
- Natural language queries replaced manual SQL report requests

Results (after 6 months):
- 62% reduction in time-to-insight (from 4 days to 1.5 days average)
- $3.2 million saved in inventory carrying costs
- 23% increase in regional manager satisfaction scores
- SQL report request backlog eliminated entirely

Quote from BigRetail CIO Janet Torres:
\"InsightBoard Pro transformed how our managers interact with data.
They went from waiting days for a report to asking questions in plain
English and getting answers in seconds.\"""",
}

# Write documents to disk
DOC_DIR = Path("acme_docs")
DOC_DIR.mkdir(exist_ok=True)

for filename, content in DOCUMENTS.items():
    (DOC_DIR / filename).write_text(content, encoding="utf-8")

print(f"Created {len(DOCUMENTS)} documents in '{DOC_DIR}/':")
for f in sorted(DOC_DIR.iterdir()):
    print(f"  {f.name} ({f.stat().st_size} bytes)")

Created 5 documents in 'acme_docs/':
  company_overview.txt (715 bytes)
  customer_case_study.txt (924 bytes)
  employee_handbook_excerpt.txt (828 bytes)
  products_and_services.txt (915 bytes)
  q4_2025_financials.txt (709 bytes)

# --- Company 2: Globex Cybersecurity Corp. → globex_docs/ ---

GLOBEX_DOCUMENTS = {
    "globex_overview.txt": """GLOBEX CYBERSECURITY CORP. — COMPANY OVERVIEW

Globex Cybersecurity Corp. was founded in 2017 by former NSA analyst
James Whitfield and AI researcher Dr. Priya Narayanan in Boston, Massachusetts.
The company provides AI-driven cybersecurity solutions for financial services
and healthcare organizations.

Headquarters: 88 Federal Street, Suite 1200, Boston, MA 02110
Employees: 412 (as of January 2026)
Annual Revenue (2025): $78.5 million
Funding: Series C ($45M raised in June 2024 from CyberVentures Capital
and Goldman Sachs Growth Equity)

Mission: "To make enterprise-grade cybersecurity autonomous, adaptive,
and accessible to every regulated industry."

The company operates four offices: Boston (HQ), Washington D.C. (government
relations), Tel Aviv (R&D), and Singapore (APAC sales).
CEO: James Whitfield. CTO: Dr. Priya Narayanan. CISO: Robert Tanaka.""",

    "globex_products.txt": """GLOBEX CYBERSECURITY — PRODUCTS AND SERVICES

1. ShieldAI Platform (Flagship Product)
   - AI-powered threat detection and automated incident response
   - Uses behavioral analysis to detect zero-day attacks in real time
   - Pricing: $120/endpoint/year (Standard), $195/endpoint/year (Premium)
   - 623 enterprise customers across 18 countries as of Q4 2025
   - Average detection time: 2.3 minutes (industry average: 197 days)

2. VaultGuard
   - Data encryption and access management platform for regulated industries
   - HIPAA, SOX, and PCI-DSS compliant out of the box
   - Pricing: $35,000/year (up to 500 users), $60,000/year (unlimited users)
   - Launched January 2023

3. ThreatIntel Live
   - Real-time threat intelligence feed with dark web monitoring
   - Pricing: $8,500/month (includes API access and analyst dashboard)
   - Aggregates data from 2,400+ sources updated every 15 minutes
   - Integrates with Splunk, CrowdStrike, Palo Alto, and Microsoft Sentinel

All products include a dedicated Security Operations Center (SOC) team for
Premium customers. Globex holds ISO 27001 and SOC 2 Type II certifications.""",

    "globex_incident_report.txt": """GLOBEX CYBERSECURITY — 2025 THREAT LANDSCAPE REPORT (PUBLIC)

Published: February 2026

KEY FINDINGS FROM GLOBEX'S CUSTOMER BASE (2025):

Total incidents analyzed: 14,237 across 623 customers
- Ransomware attacks: up 42% YoY (3,891 incidents)
- Phishing campaigns: up 28% YoY (5,102 incidents)
- Supply chain attacks: up 67% YoY (1,245 incidents — fastest-growing category)
- Insider threats: relatively stable at 1,834 incidents

ShieldAI Platform Performance:
- Blocked 97.3% of attacks before any data exfiltration
- Mean time to detect (MTTD): 2.3 minutes
- Mean time to respond (MTTR): 8.7 minutes (automated response)
- False positive rate: 0.4% (down from 1.2% in 2024)

NOTABLE CASE: MERIDIAN HEALTH SYSTEM
- 340-bed hospital network targeted by Medusa ransomware variant
- ShieldAI detected anomalous lateral movement within 47 seconds
- Automated containment isolated 3 affected servers in under 2 minutes
- Zero patient data exfiltrated; zero downtime for clinical systems
- Estimated savings: $12M+ (average healthcare ransomware cost per Ponemon)

INDUSTRY PREDICTIONS FOR 2026:
- AI-generated phishing emails will become indistinguishable from human-written
- Quantum-resistant encryption adoption will begin in financial services
- Average cost of a data breach expected to exceed $5.2 million globally""",
}

GLOBEX_DIR = Path("globex_docs")
GLOBEX_DIR.mkdir(exist_ok=True)

for filename, content in GLOBEX_DOCUMENTS.items():
    (GLOBEX_DIR / filename).write_text(content, encoding="utf-8")

print(f"Created {len(GLOBEX_DOCUMENTS)} documents in '{GLOBEX_DIR}/':")
for f in sorted(GLOBEX_DIR.iterdir()):
    print(f"  {f.name} ({f.stat().st_size} bytes)")

Created 3 documents in 'globex_docs/':
  globex_incident_report.txt (1321 bytes)
  globex_overview.txt (864 bytes)
  globex_products.txt (1131 bytes)

# --- Company 3: Nextera Green Solutions → nextera_docs/ ---

NEXTERA_DOCUMENTS = {
    "nextera_overview.txt": """NEXTERA GREEN SOLUTIONS — COMPANY OVERVIEW

Nextera Green Solutions was founded in 2020 by climate scientist Dr. Elena
Vasquez and former McKinsey partner David Okafor in Denver, Colorado.
The company provides sustainability consulting and carbon accounting software
to help companies meet ESG (Environmental, Social, Governance) targets.

Headquarters: 1700 Lincoln Street, Suite 3400, Denver, CO 80203
Employees: 156 (as of January 2026)
Annual Revenue (2025): $23.8 million
Funding: Series A ($12M raised in November 2023 from Climate Capital Partners
and Patagonia Ventures)

Mission: "To give every company a clear, data-driven path to net zero —
turning climate commitments into measurable action."

The company operates two offices: Denver (HQ) and London (European operations).
CEO: Dr. Elena Vasquez. COO: David Okafor. VP Engineering: Kenji Tanaka.
Nextera is a certified B Corporation and a member of the UN Global Compact.""",

    "nextera_services.txt": """NEXTERA GREEN SOLUTIONS — SERVICES AND PLATFORM

1. CarbonLens Platform (Flagship Product)
   - Automated Scope 1, 2, and 3 carbon emissions tracking
   - Integrates with ERP systems (SAP, Oracle, NetSuite) to pull real data
   - AI-powered recommendations for emission reduction strategies
   - Pricing: $2,500/month (Standard, up to $500M revenue companies)
              $6,500/month (Enterprise, unlimited + custom reporting)
   - 312 active customers as of Q4 2025

2. ESG Reporting Suite
   - Automated generation of sustainability reports (GRI, SASB, TCFD, CDP)
   - Audit-ready documentation with full data lineage
   - Pricing: $18,000/year (bundled with CarbonLens Enterprise)
   - Standalone: $24,000/year
   - Launched April 2024

3. Sustainability Strategy Consulting
   - Science-Based Targets initiative (SBTi) alignment services
   - Supply chain decarbonization roadmaps
   - Pricing: Project-based ($75,000 – $250,000 per engagement)
   - Team of 28 consultants with backgrounds in climate science and industry

4. Carbon Offset Marketplace
   - Verified carbon credits from 45 projects across 12 countries
   - Blockchain-verified chain of custody for each credit
   - Transaction fee: 4.5% per credit purchase
   - Beta launched August 2025

All platform customers receive quarterly sustainability benchmarking reports
comparing their performance against industry peers.""",

    "nextera_impact_report.txt": """NEXTERA GREEN SOLUTIONS — 2025 IMPACT REPORT

Published: January 2026

COMPANY IMPACT METRICS:
- Total customer carbon emissions tracked: 47.3 million metric tons CO2e
- Verified emission reductions enabled: 6.8 million metric tons CO2e (14.4%)
- Customers who improved their CDP score after using Nextera: 89%
- Average time to produce ESG report reduced from 14 weeks to 3.5 weeks

NOTABLE CLIENT RESULTS:

1. SummitBrew Coffee (specialty coffee chain, 380 locations)
   - Reduced Scope 3 supply chain emissions by 31% in 18 months
   - Switched 62% of logistics to electric vehicles using Nextera roadmap
   - First coffee chain to achieve SBTi-validated near-term target

2. Pacific Northwest Lumber Co. (timber and building materials)
   - Used CarbonLens to discover 40% of emissions came from transportation
   - Optimized shipping routes: saved 8,200 metric tons CO2e and $1.4M/year
   - Achieved carbon-neutral certification for 2025 operations

3. Meridian Financial Group (regional bank, $8B assets)
   - Implemented financed emissions tracking (PCAF methodology)
   - Identified $2.1B in loan portfolio exposed to high transition risk
   - Developed green lending program: $340M in sustainability-linked loans

AWARDS AND RECOGNITION (2025):
- Fast Company "Most Innovative Companies" — Energy category (#7)
- B Corp "Best for the World" — Environment category
- Colorado Governor's Award for Clean Energy Innovation""",
}

NEXTERA_DIR = Path("nextera_docs")
NEXTERA_DIR.mkdir(exist_ok=True)

for filename, content in NEXTERA_DOCUMENTS.items():
    (NEXTERA_DIR / filename).write_text(content, encoding="utf-8")

print(f"Created {len(NEXTERA_DOCUMENTS)} documents in '{NEXTERA_DIR}/':")
for f in sorted(NEXTERA_DIR.iterdir()):
    print(f"  {f.name} ({f.stat().st_size} bytes)")

Created 3 documents in 'nextera_docs/':
  nextera_impact_report.txt (1434 bytes)
  nextera_overview.txt (938 bytes)
  nextera_services.txt (1394 bytes)

Load Documents from Three Company Directories¶

LangChain provides Document Loaders -- modular components that load data from various sources. Here we use DirectoryLoader to load .txt files from three separate company directories, each representing a different fictional company. This is the LangChain equivalent of LlamaIndex’s SimpleDirectoryReader, but notice how much more explicit it is: you specify the loader class (TextLoader), the file glob pattern, and the directory path separately. LangChain favors this explicit, composable style over LlamaIndex’s “batteries included” approach.

from langchain_community.document_loaders import DirectoryLoader, TextLoader

# Load documents from three separate company directories
DOC_DIRS = {
    "acme_docs":    "Acme Analytics (BI tools)",
    "globex_docs":  "Globex Cybersecurity (security)",
    "nextera_docs": "Nextera Green Solutions (sustainability)",
}

all_docs = []
for dir_name, label in DOC_DIRS.items():
    loader = DirectoryLoader(dir_name, glob="**/*.txt", loader_cls=TextLoader,
                             loader_kwargs={"encoding": "utf-8"})
    docs = loader.load()
    print(f"  Loaded: '{dir_name}/' \u2192 {len(docs)} docs  [{label}]")
    all_docs.extend(docs)

print(f"\nTotal documents loaded: {len(all_docs)}")

  Loaded: 'acme_docs/' → 5 docs  [Acme Analytics (BI tools)]
  Loaded: 'globex_docs/' → 3 docs  [Globex Cybersecurity (security)]
  Loaded: 'nextera_docs/' → 3 docs  [Nextera Green Solutions (sustainability)]

Total documents loaded: 11

Split, Embed, and Index with FAISS¶

This is where LangChain’s modular architecture becomes visible. Each step in the pipeline is an explicit, swappable component:

Text Splitter (RecursiveCharacterTextSplitter) -- breaks documents into smaller chunks, trying to split at paragraph boundaries before falling back to sentence and character boundaries
Embeddings (HuggingFaceEmbeddings) -- converts each chunk into a dense vector
Vector Store (FAISS) -- stores and indexes the vectors for fast similarity search

This is the same embed-then-store-then-retrieve pattern as LlamaIndex, but here each step is a separate object you can inspect, swap, or configure independently. Want to switch from FAISS to ChromaDB? Just change one line. Want a different chunking strategy? Swap the splitter. That modularity is LangChain’s defining philosophy.

from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS

# Step 1: Split documents into chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    separators=["\n---\n", "\n\n", "\n", " "],  # split on section dividers first
)
chunks = text_splitter.split_documents(all_docs)
print(f"Split {len(all_docs)} documents into {len(chunks)} chunks")

# Step 2 & 3: Embed chunks and store in FAISS vector store
vectorstore = FAISS.from_documents(chunks, embeddings)
print(f"FAISS vector store built with {len(chunks)} vectors")

# Create a retriever (top-k=3)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
print("Retriever ready (top_k=3)")

Split 11 documents into 15 chunks
FAISS vector store built with 15 vectors
Retriever ready (top_k=3)

Peek Under the Hood -- Test Retrieval¶

Before querying the LLM, let us see what chunks the retriever returns for a test question. This is exactly the same kind of inspection we did with LlamaIndex -- verifying that the right documents are being retrieved before we trust the LLM to generate an answer from them. Understanding what the retriever returns is the single most important debugging step in any RAG pipeline, because if retrieval is wrong, no amount of LLM sophistication can save the answer.

# Test retrieval without LLM — just see what chunks come back
test_query = "How much revenue did Acme make in Q4 2025?"
retrieved_docs = retriever.invoke(test_query)

print(f"Query: '{test_query}'")
print(f"Retrieved {len(retrieved_docs)} chunks:\n")
for i, doc in enumerate(retrieved_docs, 1):
    source = doc.metadata.get('source', 'unknown')
    text_preview = doc.page_content[:200].replace('\n', ' ')
    print(f"  Chunk #{i} (source={source})")
    print(f"    {text_preview}...\n")

Query: 'How much revenue did Acme make in Q4 2025?'
Retrieved 3 chunks:

  Chunk #1 (source=acme_docs/q4_2025_financials.txt)
    ACME ANALYTICS — Q4 2025 FINANCIAL RESULTS (CONFIDENTIAL)  Period: October 1 – December 31, 2025  Revenue:          $12.8 million (Q4) / $42.3 million (FY 2025) Gross Margin:     78.2% Operating Profi...

  Chunk #2 (source=acme_docs/company_overview.txt)
    ACME ANALYTICS INC. — COMPANY OVERVIEW  Acme Analytics Inc. was founded in 2019 by Dr. Sarah Chen and Marcus Rivera in Lawrence, Kansas. The company specializes in AI-powered business intelligence too...

  Chunk #3 (source=acme_docs/employee_handbook_excerpt.txt)
    ACME ANALYTICS — EMPLOYEE HANDBOOK (EXCERPT)  PAID TIME OFF (PTO): - All full-time employees receive 22 days of PTO per year (accrued monthly). - PTO increases to 27 days after 3 years of service. - U...

Build a Retrieval Chain (LCEL)¶

LangChain’s modern approach uses LCEL (LangChain Expression Language) to compose chains with the | pipe operator. If you have used Unix shell pipes (cat file | grep pattern | sort), the mental model is identical: data flows left to right through a series of transformations.

Here is how to read the chain we build below:

The retriever fetches relevant chunks from the FAISS vector store
format_docs concatenates those chunks into a single text block
The prompt template combines the chunks + user question into a structured prompt
The LLM generates an answer from that prompt
StrOutputParser() extracts the text from the LLM’s response object

Each | connects one step’s output to the next step’s input. This is more flexible and composable than the older RetrievalQA chain -- you can insert, remove, or rearrange steps just by editing the pipe expression.

We also set up a direct LLM call function for the “without RAG” comparison.

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser


def format_docs(docs):
    """Join retrieved documents into a single context string."""
    return (chr(10) * 2 + "---" + chr(10) * 2).join(doc.page_content for doc in docs)


# The prompt template is used by the RAG chain below
rag_prompt = ChatPromptTemplate.from_template(
    """Answer the question based ONLY on the following context.
If the context doesn't contain the answer, say "I don't have enough information."

Context:
{context}

Question: {question}

Answer:"""
)


# =============================================================================
# RAG Chain with LCEL (LangChain Expression Language) -- the modern way
# =============================================================================
# LCEL uses the pipe operator `|` to chain Runnables together, similar to
# Unix pipes (`cat file.txt | grep foo | wc -l`).
#
# HOW TO READ THE PIPE SYNTAX:
#
#   result = input | step1 | step2 | step3
#
# means: input goes into step1, step1's output goes into step2, etc.
# Each `|` is "then pass to".
#
# Reading top-to-bottom when you call chain.invoke("your question"):
#
#   1. The DICT at the top runs TWO things in parallel with the same input:
#      - "context":  retriever | format_docs
#                    (fetches docs, then joins them into a string)
#      - "question": RunnablePassthrough()
#                    (just passes the question through unchanged)
#      Output: {"context": "...retrieved text...", "question": "your question"}
#
#   2. | rag_prompt
#      Fills the {context} and {question} placeholders in the prompt template.
#
#   3. | llm
#      Sends the filled prompt to the LLM. Returns an AIMessage object.
#
#   4. | StrOutputParser()
#      Extracts .content from the AIMessage so you get a plain string back.

chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | rag_prompt
    | llm
    | StrOutputParser()
)
print("LCEL RAG chain ready: retriever | format_docs | prompt | llm | parser")


# =============================================================================
# ALTERNATIVE: Custom class WITHOUT LangChain (shown for comparison)
# =============================================================================
# Here's what the same RAG pipeline looks like as a plain Python class,
# without using LCEL or LangChain's chaining features. It works, but...
#
# WHY LCEL IS USEFUL COMPARED TO THIS CUSTOM CLASS:
#   - Streaming tokens:   for token in chain.stream(q): ...   (works automatically in LCEL)
#                         The custom class would need to be rewritten to handle streaming.
#   - Batch processing:   chain.batch([q1, q2, q3])            (parallel execution automatic)
#                         The custom class would need threading/async code added manually.
#   - Automatic fallback: chain.with_fallbacks([backup])       (one line)
#                         The custom class would need try/except blocks everywhere.
#   - Tracing/debugging:  works with LangSmith out of the box
#                         The custom class would need manual logging at each step.
#   - Async support:      chain.ainvoke(q)                     (free)
#                         The custom class would need an async version written separately.
#
# Uncomment the block below to use the custom class approach instead of LCEL.
#
# class SimpleRAGChain:
#     def __init__(self, retriever, prompt, llm):
#         self.retriever = retriever
#         self.prompt = prompt
#         self.llm = llm
#
#     def invoke(self, question):
#         # Step 1: retrieve relevant docs from vector store
#         docs = self.retriever.invoke(question)
#         # Step 2: join docs into a single context string
#         context = format_docs(docs)
#         # Step 3: fill the prompt template
#         formatted = self.prompt.format(context=context, question=question)
#         # Step 4: send to the LLM
#         response = self.llm.invoke(formatted)
#         # Step 5: extract the text from the AIMessage object
#         return response.content
#
# chain = SimpleRAGChain(retriever, rag_prompt, llm)

LCEL RAG chain ready: retriever | format_docs | prompt | llm | parser

With RAG vs. Without RAG -- Three Companies¶

We test questions about three fictional companies that no LLM has seen in training. Notice how LangChain retrieves from the correct company’s documents automatically -- even though all three are in the same FAISS index. The vector similarity naturally routes each question to the right documents without any explicit routing logic. This is one of the elegant properties of embedding-based retrieval: semantic similarity does the routing for you.

questions = [
    #"What was Acme Analytics' revenue in Q4 2025?",
    #"How many days of PTO do new employees get at Acme?",
    #"What is Globex Cybersecurity's flagship product and how fast does it detect threats?",
    #"How did Globex help Meridian Health System during a ransomware attack?",
    "Who founded Nextera Green Solutions and where is it headquartered?",
    #"How much carbon emissions reduction did Nextera enable for its customers in 2025?",
]


def preview(text, max_len=800):
    text = str(text) if text else ""
    return text[:max_len] + ("..." if len(text) > max_len else "")


if not has_llm_provider():
    print("Error: No LLM API key configured. Set at least one API key in Colab Secrets.")
else:
    for i, q in enumerate(questions, start=1):
        print("=" * 80)
        print(f"Q{i}. {q}")
        print("=" * 80)

        try:
            answer = chain.invoke(q)
            print("Answer:")
            print(preview(answer))
            # Show which documents were retrieved
            source_docs = retriever.invoke(q)
            sources = set(doc.metadata.get("source", "unknown") for doc in source_docs)
            print(f"Sources: {', '.join(sources)}")
        except Exception as e:
            if _is_retriable_error(e):
                print("  Rate limited -- skipping")
            else:
                print(f"  Error: {e}")

        print()

================================================================================
Q1. Who founded Nextera Green Solutions and where is it headquartered?
================================================================================
Answer:
Nextera Green Solutions was founded by **Dr. Elena Vasquez** and **David Okafor**. It is headquartered at **1700 Lincoln Street, Suite 3400, Denver, CO 80203** in Denver, Colorado.
Sources: nextera_docs/nextera_services.txt, nextera_docs/nextera_overview.txt, nextera_docs/nextera_impact_report.txt

LangChain vs. LlamaIndex -- Comparison¶

Aspect	LangChain	LlamaIndex
Philosophy	Modular toolkit -- you compose the pipeline	Opinionated framework -- batteries included
Document loading	`DirectoryLoader`, `TextLoader`, etc.	`SimpleDirectoryReader`
Text splitting	Explicit `RecursiveCharacterTextSplitter`	Automatic (configurable)
Vector store	FAISS, Chroma, Pinecone, etc. (you choose)	Built-in or pluggable
Retrieval chain	LCEL pipe syntax (`retriever \| prompt \| llm`)	`query_engine`
Lines of code	More (explicit control)	Fewer (more abstraction)
Best for	Custom pipelines, agents, complex workflows	Quick RAG prototypes, document Q&A

Key takeaway: Both frameworks solve the same problem (RAG), but LangChain gives you more explicit control over each step, while LlamaIndex abstracts more away. Choose based on your needs.

With the RAG pipeline working, let us now see what happens when we give the LLM more autonomy. Instead of a fixed retrieve-then-generate pipeline, what if the model could choose which tools to use based on the question? This is the jump from chains to agents -- and it is where LangChain’s architecture really starts to pay off.

Beyond RAG: LangChain Agents with Tools¶

RAG is just one use case for LangChain. Its real power lies in Agents -- LLMs that can decide which tools to call based on the user’s question. While a RAG chain always follows the same retrieve-then-generate path, an agent reasons about which action to take next.

In this section, we give the LLM three tools:

Company Lookup -- searches our vector store (reusing the RAG retriever from above)
Calculator -- performs arithmetic
Current Date -- returns today’s date

The LLM reads the question, decides which tool to call by outputting a JSON action, executes it, reads the result, and generates a final answer. This is the ReAct pattern (Reason + Act), implemented with a simple prompt -- no bind_tools or special API features required. It works with any LLM provider, which is why we chose this approach over provider-specific tool-calling APIs.

As you read the code below, notice the contrast with the RAG chain: the RAG chain is a fixed pipeline, while the agent is a decision loop where the LLM chooses its own path through the available tools.

# --- Tool definitions (plain functions) ---

def company_lookup(query):
    """Search the company knowledge base."""
    docs = retriever.invoke(query)
    return chr(10).join(doc.page_content[:500] for doc in docs)


def calculator(expression):
    """Evaluate a math expression like 42.3 * 1.34"""
    try:
        allowed = set('0123456789.+-*/() ')
        if not all(c in allowed for c in expression):
            return 'Error: Only basic arithmetic allowed.'
        return str(round(eval(expression), 4))
    except Exception as e:
        return f'Error: {e}'


def current_date(dummy=''):
    """Returns today's date."""
    from datetime import date
    return date.today().strftime('%B %d, %Y')


TOOLS = {
    'company_lookup': company_lookup,
    'calculator': calculator,
    'current_date': current_date,
}

print('Defined tools:', list(TOOLS.keys()))

Defined tools: ['company_lookup', 'calculator', 'current_date']

Create and Run the Agent¶

We implement a ReAct agent loop using LCEL pipes, just like the RAG chain.

The key insight: an agent is a loop around a single step. We express ONE step of the ReAct loop as an LCEL chain:

agent_step = invoke_llm | parse_tool_call | execute_tool

Each step:

invoke_llm — sends the current message history to the LLM and captures the reply
parse_tool_call — checks if the reply contains a JSON tool call
execute_tool — if there’s a tool call, runs the tool and appends the result to the message history; otherwise returns the final answer

Then we call agent_step.invoke(state) in a Python for loop until the agent returns a final answer (or we hit a step limit). The Python loop is needed because LCEL chains are linear by design — they can’t natively express “repeat until condition”. This is exactly the kind of situation where LangGraph shines (see the LangGraph_demo notebook), since it’s built specifically for stateful graph-based workflows with branches and loops.

Why do we need `RunnableLambda`?¶

LCEL’s pipe operator | only works between Runnables — objects that have an .invoke() method. Plain Python functions don’t have a | operator defined, so my_func1 | my_func2 would fail with a TypeError.

RunnableLambda(func) wraps a plain function into a Runnable, giving it an .invoke() method and enabling the pipe operator. You only need to wrap ONE function to bootstrap the pipe — once the left side is a Runnable, LangChain auto-coerces any plain functions on the right into Runnables.

That’s why our chain looks like:

agent_step = RunnableLambda(invoke_llm) | parse_tool_call | execute_tool

Only the first function needs explicit wrapping. The same thing happens in the earlier RAG chain: retriever | format_docs works without wrapping format_docs, because retriever is already a Runnable (LangChain auto-wraps format_docs).

import json as _json
import re as _re
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage
from langchain_core.runnables import RunnableLambda

AGENT_SYSTEM = """You are a helpful business analyst. You have these tools:
company_lookup(query): Search knowledge base about Acme, Globex, or Nextera.
calculator(expression): Evaluate math like 42.3 * 1.34
current_date(): Get todays date.

To use a tool, reply with ONLY this JSON (nothing else):
{"tool": "tool_name", "input": "the input"}

After receiving a tool result, answer the original question.
If you can answer without a tool, just reply directly."""


# ---- Three small functions that become LCEL steps via RunnableLambda ----

def invoke_llm(state):
    """LCEL step 1: send messages to LLM, capture the reply."""
    response = llm.invoke(state["messages"])
    return {**state, "reply": response.content.strip()}


def parse_tool_call(state):
    """LCEL step 2: check if reply contains a tool call JSON."""
    tool_call = None
    m = _re.search(r"\{[^}]+\}", state["reply"])
    if m:
        try:
            parsed = _json.loads(m.group())
            if "tool" in parsed and "input" in parsed:
                tool_call = parsed
        except _json.JSONDecodeError:
            pass
    return {**state, "tool_call": tool_call}


def execute_tool(state):
    """LCEL step 3: run the tool (or mark as final answer if none)."""
    if state["tool_call"] is None:
        # No tool call -- this is the final answer
        return {**state, "final": state["reply"]}

    # Execute the requested tool
    tool_name = state["tool_call"]["tool"]
    tool_input = state["tool_call"]["input"]
    tool_fn = TOOLS.get(tool_name)
    if tool_fn:
        result = tool_fn(tool_input)
    else:
        result = f"Unknown tool: {tool_name}"

    # Append assistant reply and tool result to the message history
    new_messages = state["messages"] + [
        AIMessage(content=state["reply"]),
        HumanMessage(content=f"Tool result: {result}"),
    ]
    return {
        **state,
        "messages": new_messages,
        "final": None,
        "last_tool_name": tool_name,
        "last_tool_input": tool_input,
        "last_tool_result": result,
    }


# ---- Build the LCEL agent-step chain: invoke_llm | parse_tool_call | execute_tool ----
# LCEL's pipe (|) only works between Runnables. Plain Python functions don't
# have a | operator, so we wrap the FIRST function as RunnableLambda to start
# the pipe. Once the left side is a Runnable, LangChain auto-coerces any plain
# functions on the right into Runnables. So only ONE RunnableLambda is needed.
agent_step = RunnableLambda(invoke_llm) | parse_tool_call | execute_tool


def run_agent(question, verbose=True):
    """Run the LCEL agent_step in a loop until it returns a final answer."""
    state = {
        "messages": [SystemMessage(content=AGENT_SYSTEM), HumanMessage(content=question)],
        "final": None,
    }

    for step in range(5):
        state = agent_step.invoke(state)

        if state["final"] is not None:
            if verbose:
                print(f"  [Step {step+1}] Final answer")
            return state["final"]

        if verbose:
            print(f"  [Step {step+1}] Calling: {state['last_tool_name']}({state['last_tool_input']})")
            print(f"             Result: {str(state['last_tool_result'])[:150]}...")

    return "Agent exceeded maximum steps."


print("LCEL agent ready: agent_step = invoke_llm | parse_tool_call | execute_tool")

LCEL agent ready: agent_step = invoke_llm | parse_tool_call | execute_tool

agent_questions = [
    #'Who is the CTO of Globex Cybersecurity?',
    'Acme Analytics made $42.3 million in 2025. If they grow 34%, what would revenue be?',
    #"What is today's date?",
    #'What awards did Nextera Green Solutions win in 2025?',
    #'Globex has 412 employees and $78.5M revenue. What is revenue per employee?',
]

if has_llm_provider():
    for i, q in enumerate(agent_questions, start=1):
        print('=' * 80)
        print(f'Agent Q{i}. {q}')
        print('=' * 80)
        try:
            result = run_agent(q)
            print(f'Final Answer: {result}')
        except Exception as e:
            if _is_retriable_error(e):
                print('  Rate limited -- skipping')
            else:
                print(f'  Agent error: {e}')
        print()
else:
    print('Set at least one API key first.')

================================================================================
Agent Q1. Acme Analytics made $42.3 million in 2025. If they grow 34%, what would revenue be?
================================================================================
  [Step 1] Calling: calculator(42.3 * 1.34)
             Result: 56.682...
  [Step 2] Final answer
Final Answer: If Acme Analytics grows 34% from $42.3 million, their revenue would be **$56.68 million** (or $56.7 million rounded).

Key Takeaways: Agents vs. Chains¶

	RAG Chain (LCEL)	Agent with Tools
Decision making	Always retrieves, always generates	LLM chooses which tools to call
Multi-step	Single retrieve then generate	Can chain multiple tool calls
Math / logic	LLM must do math in its head	Delegates to calculator tool
Extensibility	Fixed pipeline	Add any tool (APIs, databases, web search)
Provider support	Any LLM	Any LLM (prompt-based, no bind_tools needed)

When to use which:

RAG Chain — simple document Q&A with a known pipeline
Agent — complex questions requiring reasoning, multiple data sources, or computation

FAQ: Common Questions About LCEL and Agents¶

If the pipe syntax and RunnableLambda wrappers felt confusing on first read, you are not alone. These are the most common questions students ask after working through this notebook, and understanding the answers will make every future LangChain project easier to read and write.

Why do we need `RunnableLambda` for the agent but not the RAG chain?¶

LCEL’s pipe operator | only works between Runnables (objects with an .invoke() method).

In the RAG chain:

chain = retriever | format_docs | rag_prompt | llm | StrOutputParser()

retriever is already a Runnable (it inherits from LangChain’s BaseRetriever), so the pipe starts with a Runnable. When LangChain sees retriever | format_docs, it auto-wraps format_docs (a plain Python function) into a RunnableLambda internally. Every subsequent | keeps the chain going.

In the agent:

agent_step = RunnableLambda(invoke_llm) | parse_tool_call | execute_tool

None of our three functions are Runnables -- they’re plain Python functions we defined. If we wrote invoke_llm | parse_tool_call | execute_tool, Python would try to call function.__or__(function), which doesn’t exist, and raise TypeError. That’s why we wrap the first function explicitly. Once the left side is a Runnable, LangChain auto-coerces the rest.

Rule: Wrap ONE function as RunnableLambda to bootstrap the pipe. The rest are auto-coerced.

Why three steps for the agent? Why not one big function?¶

Each LCEL step does ONE thing -- the same “small composable pieces” philosophy as the RAG chain. Mapping the steps to what they do:

Step	What it does
`invoke_llm`	Send message history to LLM, get its reply
`parse_tool_call`	Check if the reply is a tool call (JSON) or a final answer
`execute_tool`	If tool call -> run the tool and update history. If final answer -> mark done

You could write all three as ONE big function, but then:

You lose LCEL’s streaming/batching/tracing benefits
Each step can’t be unit-tested or swapped independently
It’s less readable than three small functions with clear names

Run the code¶

To run this notebook, copy the URL below into your browser’s address bar. The link opens the notebook directly in Google Colab. (If your PDF viewer makes the URL clickable and lands on a broken page, copy the full text manually -- the viewer may have truncated the link at a line break.)

https://colab.research.google.com/github/KarAnalytics/code_demos/blob/main/LangChain_demo.ipynb