Calling the Google Gemini API

Google’s Gemini family of models is accessible through the google-genai SDK. This notebook walks through installation, authentication, and making both streaming and non-streaming requests. We also show how to list available models so you can choose the right one for your task.

Setup¶

Please ensure you have imported a Gemini API key from AI Studio. You can do this directly in the Secrets tab on the left.

After doing so, please run the setup cell below.

Installing the SDK¶

We install the latest google-genai package. Keeping it up to date ensures access to the newest model releases and API features.

!pip install -U -q "google"
!pip install -U -q "google.genai"

Configuring the API Key¶

The Gemini API key is stored as a Colab secret and loaded into an environment variable. This pattern keeps credentials out of version control while making them available to the SDK.

import os
from google.colab import userdata
from google.colab import drive
from google import genai
from google.genai import types

#os.environ["GEMINI_API_KEY"] = userdata.get("FREEGOOGLE_API_KEY")
os.environ["GEMINI_API_KEY"] = userdata.get("GOOGLE_API_KEY")

Mounting Google Drive¶

If you have supporting files stored in Google Drive, mounting it provides direct filesystem access. This step is optional and only needed when your prompts reference uploaded documents.

drive.mount("/content/drive")
# Please ensure that uploaded files are available in the AI Studio folder or change the working folder.
os.chdir("/content/drive/MyDrive/Google AI Studio")

Generated Code¶

Streaming a Response with Thinking¶

The generate_content_stream method returns chunks as they are produced, which is useful for long answers. We also enable the model’s built-in thinking/reasoning capability at a medium level, letting it show intermediate reasoning steps.

def generate():
    client = genai.Client(
        # api_key=os.environ.get("GEMINI_API_KEY"),
        api_key=os.environ.get("GEMINI_API_KEY"),
    )

    model = "gemini-3-flash-preview"
    contents = [
        types.Content(
            role="user",
            parts=[
                types.Part.from_text(text="""What is 7*8"""),
            ],
        ),
    ]
    generate_content_config = types.GenerateContentConfig(
        thinking_config=types.ThinkingConfig(
            thinking_level="MEDIUM",
        ),
    )

    for chunk in client.models.generate_content_stream(
        model=model,
        contents=contents,
        config=generate_content_config,
    ):
        print(chunk.text, end="")

if __name__ == "__main__":
    generate()

Listing Available Models¶

Google frequently adds new models. Calling client.models.list() returns every model your API key has access to, helping you stay current with the latest releases.

from google import genai

client = genai.Client()

for model in client.models.list():
    print(model.name)

A Simple Non-Streaming Call¶

For short answers, a single blocking call to generate_content is simpler than streaming. The response object contains the generated text, token counts, and safety ratings.

response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents="Explain how AI works in a few words",
)

print(response.text)

Switching Models and API Keys¶

You can swap models or API keys mid-session to compare outputs or manage rate limits. Here we switch to a different key and an older Gemini model to see how it responds to the same prompt.

os.environ["GEMINI_API_KEY"] = userdata.get("FREEGOOGLE_API_KEY")

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Are you slower than gemini-3?",
)

print(response.text)

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Which model is being called?",
)

print(response.text)

Key takeaways¶

The google-genai SDK talks to Gemini models and reads its credentials from the GEMINI_API_KEY environment variable, typically populated from a Colab secret.
Streaming vs. blocking: use generate_content_stream to print tokens as they arrive, or generate_content for a single short response.
Thinking mode can be toggled via ThinkingConfig so Gemini reveals intermediate reasoning before the final answer.
client.models.list() enumerates every model your key can call, making it easy to switch between Gemini 2.5 and 3.x variants mid-session.
Swapping API keys at runtime lets you compare free-tier and paid-tier quotas or A/B-test different model versions.

Run the code¶

To run this notebook, copy the URL below into your browser’s address bar. The link opens the notebook directly in Google Colab. (If your PDF viewer makes the URL clickable and lands on a broken page, copy the full text manually -- the viewer may have truncated the link at a line break.)

Estimated run time: ~2 minutes (requires API key)

https://colab.research.google.com/github/KarAnalytics/code_demos/blob/main/Google_Studio_API_call.ipynb