Calling the Groq Cloud API - AI for Business

Groq provides ultra-fast inference for open-source LLMs using custom LPU hardware. This notebook shows how to authenticate with the Groq Python SDK, send chat completion requests, and compare different open-source models. Groq is an excellent choice when low latency is critical.

Setup and authentication¶

Install the groq Python SDK and read your API key from Colab’s secret store. You can get a free key at console.groq.com -- the free tier is generous enough for coursework and prototyping. Store it in Colab as a user secret named GROQ_KEY (left panel, key icon), and the cells below will read it into the GROQ_API_KEY environment variable that the SDK expects.

!pip install groq

import os
from groq import Groq
from google.colab import auth, userdata
os.environ["GROQ_API_KEY"] = userdata.get("GROQ_KEY")

Example 1: Structured evaluation with a system prompt¶

The Groq SDK mirrors OpenAI’s chat-completions schema: you pass a messages list where each entry has a role (system, user, or assistant) and a content string. The system role sets the model’s persona and rubric -- here, an expert research reviewer that scores proposals on four transformativeness criteria. Passing stream=True returns chunks as they are generated, which we print as they arrive so the output feels interactive rather than arriving all at once at the end.

client = Groq()
completion = client.chat.completions.create(
    model="openai/gpt-oss-120b",
    messages=[
      {
        "role": "system",
        "content": "You are an expert reviewer at a leading national research agency. You read research proposals and give a score of 1 to 100 in terms of how transformative they are, in terms of the following criteria:\n\n1) Transdisciplinary\n2) Riskiness of the proposal\n3) Extends existing knowledge significantly (i.e., disruption) instead of extending it conservatively. \n4) Idea is novel and unique and has not been proposed before\n\nTo evaluate transformativeness of a proposal, consider the research done before the year it has been submitted.  "
      },
      {
        "role": "user",
        "content": "Proposal:  \nEconometrics has been distinguished from simply statistics as applied to economics as being\nprincipally concerned with establishing causes or, what is frequently taken by economists to be\nthe same thing, structural explanation. Economics has itself developed as a science of models.\nThe proposed research is a systematic philosophical investigation of models and causal structure\nin econometrics, and is part of a larger project on the logic of econometric inference and its place\nin the explanatory practices of economics. While the philosophy of science has recently devoted\nmuch attention to models, when it comes to the case of economics, most of the attention has been\npaid to purely theoretical models. The proposed research aims to redress the balance by\ninvestigating models of data (measurement and statistical) and especially how they relate to\ntheoretical models.\nThe intellectual merit of the proposed research rests in this: Econometrics is the core\nmethodology of applied economics, yet its deepest foundations have rarely been examined. It is\nimportant to examine its methods philosophically – that is, at a higher level of generality – in\norder a) to better understand the key issues confronting econometrics, which may provide a\nframework in which students and professional economists can more clearly understand the\nconceptual structure in which econometric methods function as an aid to learning and more\neffective application; b) to better understand the relationship of different approaches within\neconometrics, raising the possibility that current internal disputes can be resolved better from a\nmore detached perspective and so provide some guidance on how best to practice econometrics;\nc) to better understand the import for econometrics of the philosophical analysis of scientific\ninference generally, which may aid in improving those methods or relating them to methods of\nother sciences with the possibility of improving econometrics itself; and d) (the converse of this\nlast point) to better understand the import of econometric methods for the problems of scientific\ninference generally, which may then provide a new resource to the philosophy of science, at\npresent overwhelmingly informed by the problems of the natural sciences, and which may be of\nuse in understanding (and possibly improving) the methods of other areas of science. \n\nYear: 2010\n"
      }
    ],
    temperature=1,
    max_completion_tokens=8192,
    top_p=1,
    reasoning_effort="medium",
    stream=True,
    stop=None
)

for chunk in completion:
    print(chunk.choices[0].delta.content or "", end="")

Example 2: Multi-turn conversation¶

You can prime the model with a prior turn by including an assistant message in the messages list. The model treats this as conversation history and responds in that context. This is how chat apps build up a running dialogue: append each new user and assistant turn to the list, then ask for the next completion. Unlike ChatGPT’s web UI, Groq’s API does not persist history on the server -- your client is responsible for sending the full conversation each call.

client = Groq()
completion = client.chat.completions.create(
    model="openai/gpt-oss-120b",
    messages=[
      {
        "role": "user",
        "content": "Are chat messages not saved in groq?\n"
      },
      {
        "role": "assistant",
        "content": "## TL;DR  \n**GROQ itself never stores data** – it’s only a *read‑only* query language for the data that lives in Sanity’s Content Lake (or any other backend that implements the GROQ spec).  \nIf you’re not seeing your chat messages when you run a GROQ query, the messages simply haven’t been **written** (or have been written to the wrong document type / dataset) rather than “lost” by GROQ.\n\nBelow is a quick primer on how chat messages are typically persisted in Sanity, common pitfalls that make it look like “messages aren’t saved”, and a checklist you can run through to debug the issue.\n\n---\n\n## 1. What GROQ *does* and *doesn’t* do\n\n| Feature | GROQ | Sanity client (or other API) |\n|---------|------|------------------------------|\n| **Read data** | ✅ `*[_type == \"message\"]{…}` | ✅ |\n| **Write / mutate data** | ❌ (no `INSERT`, `UPDATE`, `DELETE`) | ✅ `client.create`, `client.patch`, `client.delete` |\n| **Filter, sort, project, reference‑resolve** | ✅ | ✅ |\n| **Live preview / realtime updates** | ✅ (via `listen` on the client) | ✅ |\n| **Persist data** | ❌ (needs a separate write operation) | ✅ |\n\n> **Bottom line:** If you can query the data with GROQ, it **must** already exist in the Content Lake. If it doesn’t, the problem is on the “write” side.\n\n---\n\n## 2. Typical Sanity schema for a chat app\n\n```js\n// schemas/message.js\nexport default {\n  name: \"message\",\n  title: \"Chat Message\",\n  type: \"document\",\n  fields: [\n    {\n      name: \"text\",\n      title: \"Message Text\",\n      type: \"text\",\n    },\n    {\n      name: \"sender\",\n      title: \"Sender\",\n      type: \"reference\",\n      to: [{type: \"user\"}],\n    },\n    {\n      name: \"sentAt\",\n      title: \"Sent At\",\n      type: \"datetime\",\n      initialValue: () => new Date().toISOString(),\n    },\n    {\n      name: \"room\",\n      title: \"Chat Room\",\n      type: \"reference\",\n      to: [{type: \"room\"}],\n    },\n  ],\n};\n```\n\nYou’ll also have a `user` and a `room` document type (or you can embed those fields directly if you prefer a flatter model).\n\n---\n\n## 3. Writing a message – the “save” part\n\n```js\nimport sanityClient from '@sanity/client'\n\nconst client = sanityClient({\n  projectId: 'yourProjectId',\n  dataset: 'production',\n  token: 'sanity-write-token',   // must have write permissions\n  useCdn: false,                 // false = always hit the API (important for fresh writes)\n})\n\n// Example payload from a UI form\nconst newMessage = {\n  _type: 'message',\n  text: 'Hey, how are you?',\n  sender: {_type: 'reference', _ref: userId},\n  room:   {_type: 'reference', _ref: roomId},\n  sentAt: new Date().toISOString(),\n}\n\n// Persist\nclient\n  .create(newMessage)\n  .then(res => console.log('Message saved ✅', res))\n  .catch(err => console.error('❌ Save failed', err))\n```\n\n> **Key points**  \n> * The client **must** be instantiated with a **write‑capable token**. If you use the public CDN token (or `useCdn: true` without a token), writes are silently ignored.  \n> * `useCdn: false` forces the request to go to the API instead of the edge cache, ensuring you read the freshly written document right after creating it.  \n> * If you’re using the **Sanity Studio** UI for testing, make sure the “Create” button actually hits the same dataset you’re querying in your front‑end.\n\n---\n\n## 4. Common reasons messages “don’t appear” in GROQ\n\n| Symptom | Likely Cause | Fix |\n|---------|--------------|-----|\n| **Empty result** (`[]`) when you know you sent a message | a) Write never happened (missing token, `useCdn:true`) <br> b) Message saved to a *different* dataset (`development` vs `production`) <br> c) Wrong `_type` name (`Message` vs `message`) | Verify token scopes (`write`), double‑check dataset name, run `client.fetch('*[_type == \"message\"]')` in the same environment you used to write. |\n| **Message appears only after a refresh** | Using the CDN cache (`useCdn:true`) for reads – CDN may still hold the pre‑write snapshot. | Set `useCdn:false` for reads that need real‑time consistency, or use the **listener** API (`client.listen`) to get push updates. |\n| **Only the most recent message shows up** | Over‑writing the same document ID (e.g., you used `client.createOrReplace({_id: 'temp', …})` each time). | Use `client.create` (auto‑generates a unique `_id`) or generate a UUID per message. |\n| **Message shows up, but fields are missing** | Schema mismatch (e.g., you omitted a required field, Sanity silently dropped it) | Check the schema’s `validation` rules; run `client.create` with the full payload and inspect the returned document. |\n| **`listen` never fires** | Listening to the wrong dataset or using a read‑only token | Use the same project/dataset and a token with `read` scope. |\n\n---\n\n## 5. Quick debugging checklist\n\n1. **Confirm you have a write token**  \n   ```bash\n   curl -H \"Authorization: Bearer <token>\" \\\n        \"https://<projectId>.api.sanity.io/v2021-06-07/data/mutate/production?returnIds=true\" \\\n        -d '{\"mutations\":[{\"create\":{\"_type\":\"message\",\"text\":\"test\"}}]}'\n   ```\n   You should get a `201` with the new document ID. If you get `401` or `403`, the token is not authorized.\n\n2. **Inspect the dataset directly**  \n   - In **Sanity Studio**: open the “Message” list → you should see the rows.  \n   - Via **API**: `client.fetch('*[_type==\"message\"]{_id,text}')` → should return an array.\n\n3. **Check the network tab** (browser dev tools) when you click “Send”.  \n   - **POST** to `https://<projectId>.api.sanity.io/v2021-06-07/data/mutate/...`  \n   - Response payload includes `createdId` or an error.\n\n4. **Validate the GROQ query**  \n   ```js\n   // Simple sanity check\n   client.fetch(`\n     *[_type == \"message\" && room._ref == $roomId] | order(sentAt desc)[0...10]{\n       _id,\n       text,\n       \"senderName\": sender->name,\n       sentAt\n     }\n   `, {roomId})\n   ```\n   If this still returns `[]`, the data really isn’t there.\n\n5. **Look at the dataset name**  \n   - In `sanity.json` you may have `\"api\": { \"projectId\": \"...\", \"dataset\": \"development\" }`.  \n   - Your front‑end code might be pointing at `\"production\"` – mismatch = no results.\n\n---\n\n## 6. Example end‑to‑end flow (React + Sanity)\n\n```tsx\n// ChatInput.tsx\nimport {useState} from 'react'\nimport {client} from './sanityClient'   // pre‑configured client\n\nexport default function ChatInput({roomId, userId}) {\n  const [msg, setMsg] = useState('')\n\n  const send = async () => {\n    if (!msg.trim()) return\n    try {\n      await client.create({\n        _type: 'message',\n        text: msg,\n        sender: {_type: 'reference', _ref: userId},\n        room:   {_type: 'reference', _ref: roomId},\n        sentAt: new Date().toISOString(),\n      })\n      setMsg('')               // clear UI\n    } catch (e) {\n      console.error('💥 Could not save message', e)\n    }\n  }\n\n  return (\n    <div>\n      <input value={msg} onChange={e => setMsg(e.target.value)} />\n      <button onClick={send}>Send</button>\n    </div>\n  )\n}\n```\n\n```tsx\n// ChatMessages.tsx\nimport {useEffect, useState} from 'react'\nimport {client} from './sanityClient'\n\nexport default function ChatMessages({roomId}) {\n  const [messages, setMessages] = useState([])\n\n  // Initial fetch\n  useEffect(() => {\n    client\n      .fetch(`\n        *[_type==\"message\" && room._ref == $roomId] | order(sentAt desc)[0...50]{\n          _id, text, sentAt,\n          \"senderName\": sender->name\n        }\n      `, {roomId})\n      .then(setMessages)\n  }, [roomId])\n\n  // Real‑time updates\n  useEffect(() => {\n    const subscription = client\n      .listen(`*[_type==\"message\" && room._ref == $roomId]`, {roomId})\n      .subscribe(event => {\n        if (event.transition === 'appear') {\n          setMessages(prev => [event.result, ...prev])\n        }\n      })\n    return () => subscription.unsubscribe()\n  }, [roomId])\n\n  return (\n    <ul>\n      {messages.map(m => (\n        <li key={m._id}>\n          <strong>{m.senderName}:</strong> {m.text}\n        </li>\n      ))}\n    </ul>\n  )\n}\n```\n\n*If the above code works in a fresh project, then your issue is almost certainly a configuration mismatch rather than a GROQ limitation.*\n\n---\n\n## 7. TL;DR Answer to Your Question\n\n> **“Are chat messages not saved in GROQ?”**  \n\nNo. GROQ doesn’t *save* anything. It can only **read** what has already been stored in Sanity (or any GROQ‑compatible backend). If your chat messages aren’t appearing when you query them, the problem lies in the **write path** – token permissions, dataset selection, schema name, or using the CDN cache for reads.\n\nFix the write path, and then any standard GROQ query (as shown above) will retrieve the messages instantly.\n\n---\n\n### Need more help?\n\n* Which environment are you testing (Sanity Studio, custom front‑end, Next.js, etc.)?  \n* Are you using the default public read‑only token or a custom write token?  \n* Could you share the exact GROQ query you’re running and the code you use to create a message?  \n\nDrop those details and I can point out the precise line that’s tripping you up. Happy chatting! 🚀"
      }
    ],
    temperature=1,
    max_completion_tokens=8192,
    top_p=1,
    reasoning_effort="medium",
    stream=True,
    stop=None
)

for chunk in completion:
    print(chunk.choices[0].delta.content or "", end="")

Key takeaways¶

Groq Cloud runs open-weight models (Llama, Mixtral, GPT-OSS) on custom LPU hardware, delivering the lowest-latency inference among mainstream providers.
The groq SDK mirrors OpenAI’s chat-completions schema, so migrating prompts between the two providers requires almost no code changes.
Streaming with stream=True lets you print tokens as soon as they are produced, which showcases Groq’s speed advantage.
A generous free tier (with daily token quotas) makes Groq ideal for prototyping agentic workflows before committing to paid usage.
reasoning_effort controls how much internal deliberation reasoning-capable models perform before emitting the final answer.

Run the code¶

To run this notebook, copy the URL below into your browser’s address bar. The link opens the notebook directly in Google Colab. (If your PDF viewer makes the URL clickable and lands on a broken page, copy the full text manually -- the viewer may have truncated the link at a line break.)

Estimated run time: ~2 minutes (requires API key)

https://colab.research.google.com/github/KarAnalytics/code_demos/blob/main/Groq_Cloud_API_Call.ipynb