Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

The llm_cascade Package

University of Kansas School of Business

Every notebook in this book uses llm_cascade for automatic LLM provider fallback. Here’s a quick reference.

Diagram showing five provider boxes connected left-to-right by arrows; each arrow is labelled "503 / quota", indicating that a failure from the current provider falls through to the next.

Figure 1:How llm_cascade cascades through providers: the first to succeed returns the response; a 503, 429, or quota-exceeded error on one provider falls through to the next. The cascade tries all 8 providers in order; only the first five are shown here.

As shown in Figure 1, your notebook call never has to know which provider ultimately served the response — response.provider tells you after the fact.

Install

pip install git+https://github.com/KarAnalytics/llm_cascade.git

Quick start

from llm_cascade import get_cascade

llm = get_cascade()                          # auto-detects your API keys
response = llm.generate("What is Python?")   # falls back if one provider is down
print(response.text)                         # the answer
print(response.provider)                     # which provider answered (e.g., "Gemini")
print(response.model)                        # which model was used

Supported providers

ProviderEnv VariableFree Tier
OpenAIOPENAI_API_KEYLimited free credits
GeminiGEMINI_API_KEY500 req/day
Ollama CloudOLLAMA_API_KEYFree tier
Grok (xAI)XAI_API_KEY$25/month free
GroqGROQ_API_KEY30 req/min
HuggingFaceHF_TOKENFree inference
CohereCOHERE_API_KEY20 req/min
OpenRouterOPENROUTER_API_KEYFree models

Set any key in your .env file or Colab Secrets. The cascade tries them in order and falls back automatically on quota errors, auth failures, or server outages.

Override models per notebook

llm = get_cascade(models={"Gemini": "gemini-2.5-pro", "OpenAI": "gpt-4o"})
# or after creation:
llm.set_model("Gemini", "gemini-2.5-pro")

Embeddings (local, no API key needed)

embedding = llm.get_embedding("some text")  # uses all-MiniLM-L6-v2 locally

Source: github.com/KarAnalytics/llm_cascade