Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Prompt Engineering with Google AI Studio

University of Kansas School of Business

Every interaction with a large language model begins with a prompt — the text you send in. The quality of that prompt often makes the difference between a useful answer and a generic one, or between hitting the right output format on the first try and iterating three times to get there. Modern models are forgiving enough that a one-line question usually works, but a little discipline about how you ask pays disproportionate dividends — especially when the output will be consumed by a downstream system rather than read by a human.

This chapter is deliberately hands-on rather than theoretical. We work through the core prompting techniques using Google AI Studio (https://aistudio.google.com/), a free browser-based playground that gives direct access to Gemini models without writing code. Everything here transfers directly to the API-based notebooks in the chapters that follow: when you call llm.generate(prompt) in Python, prompt is just the string you learned to construct in AI Studio. Settle into the playground first; move the patterns into code later.

Zero-Shot, Few-Shot, and the System Prompt

The simplest prompt is zero-shot: you describe the task, provide the input, and rely on the model’s pre-trained knowledge to produce the output.

Classify the text into neutral, negative, or positive.
Text: I think the food was okay.
Sentiment:

Zero-shot prompts work well for anything the model has seen many similar examples of during pre-training — sentiment classification, summarization, translation, simple Q&A. They fail when the task is domain-specific, when the output format matters, or when your notion of “correct” does not match the model’s defaults.

Few-shot prompting addresses the second and third problems by embedding a handful of input-output examples directly in the prompt, so the model can see what a good response looks like before generating its own.

Classify the text into neutral, negative, or positive.

Example: The food was awesome.
Sentiment: positive

Text: I think the vacation is okay.
Sentiment:

Research has surfaced a counterintuitive finding about why few-shot actually works in large models: larger models learn mostly from the format of examples, not from the labels themselves. Wei et al. (2023) showed that deliberately flipping the labels in few-shot examples barely hurts large-model accuracy on many tasks, while the structural consistency of the examples matters enormously. The practical implication for you is to put your energy into keeping the example format clean and consistent, rather than agonizing over which particular examples are most “informative.”

A third prompting scaffold is the system prompt — persistent context that applies to every user turn in a conversation. In AI Studio you set the system prompt in a dedicated field before starting a chat; in the Python API you pass it as a separate system_instruction argument. System prompts are where to put the persona, guardrails, and formatting requirements that should survive across an entire session, so they do not have to be repeated in every user turn.

The PICFAT Framework

For prompts that go beyond a one-line question, six components together cover almost everything you might need to specify. We call this checklist the PICFAT framework:

A six-row diagram listing the PICFAT components — Persona, Instruction, Context, Format, Audience, Tone — with an example sentence for each, showing how together they describe a prompt asking an LLM expert to summarize a paper for busy researchers.

Figure 1:The PICFAT framework: Persona, Instruction, Context, Format, Audience, Tone. Only Instruction is strictly required; the other five are optional but each becomes valuable as the task grows in specificity.

If your task requires data (a document to summarize, a table to analyze, a code snippet to review), that data goes in a seventh slot, typically fenced off with a code block or a clear delimiter so the model does not confuse the data with the instructions.

In practice, Persona and Format are the two you will add most often. Audience and Tone matter more the closer the output sits to content that will be read by humans, versus consumed by another system. The six-component prompt below is a useful reference template — note how each section maps cleanly to one of the PICFAT slots, with Markdown headings making the structure explicit:

# Persona
You are an expert in large language models. You excel at breaking down complex papers into digestible summaries.

# Instruction
Summarize the key findings of the paper provided.

# Context
The summary should extract the most crucial points that help researchers quickly understand the vital information.

# Format
Create a bullet-point summary that outlines the method, followed by a concise paragraph that encapsulates the main results.

# Audience
Busy researchers who need to grasp the latest trends in LLMs.

# Tone
Professional and clear.

---

[Paste paper text here]

Prompting Reasoning Models

The PICFAT framework was shaped by what worked on earlier instruction-tuned models like GPT-3.5, Llama 3, and the original Gemini. A newer class of models — OpenAI’s o1, Gemini 2.5 with thinking enabled, DeepSeek-R1, and others — is trained to generate internal reasoning chains before the final answer. These reasoning models do not need you to tell them to “think step by step” — they have been explicitly rewarded for doing so during training. What they benefit from instead is a tighter specification of what counts as a correct answer and what the response must avoid.

A template for prompting reasoning models with six labelled slots — Role, Task, Background, Constraints, Style, Output — each containing a short bracketed placeholder such as "Expert [Job Title] with [X] years of experience" or "Do not include [X]; prioritize [Y]".

Figure 2:The reasoning-model template: Role, Task, Background, Constraints, Style, Output. Compared to PICFAT, the emphasis shifts away from “how should you think” toward “what does a correct answer look like, and what must it avoid?”

A concrete reasoning-model prompt that exercises all six slots:

Role: Senior AI architect with a talent for clear, analogical teaching.

Task: Deconstruct and explain the mechanism and value proposition of Retrieval-Augmented Generation (RAG).

Background: Contrast RAG against "parametric knowledge" (what the model learned during training) to show why RAG is necessary for reducing hallucinations and using private data.

Constraints: Use a "first principles" reasoning approach. Do not just define the acronym; explain the flow of data from query to vector database to augmented prompt.

Style: Analytical yet accessible, written for a product manager who understands tech but does not code.

Output: A structured breakdown including a "library vs. encyclopedia" analogy, a three-step technical workflow, and a brief "when to use RAG vs. fine-tuning" table.

A related research finding worth knowing: Kojima et al. (2022) discovered that simply appending “Let’s think step by step” to an ordinary prompt — with no examples at all — substantially improves performance on reasoning benchmarks when used with non-reasoning models. This “zero-shot chain-of-thought” trick revealed that step-by-step reasoning was already latent in pre-trained models and just needed the right surface trigger. Reasoning models effectively bake this trigger in at the training stage, which is why you no longer have to say it out loud.

Sandwich Prompting and Markdown Structure

When a prompt grows long — many paragraphs of context, multiple few-shot examples, or a large data block — the model’s attention tends to weigh the start and end of the prompt more heavily than the middle, a phenomenon often called “lost in the middle.” Sandwich prompting is the practical countermeasure: state the critical instruction at the beginning and repeat a condensed version of it at the end, sandwiching the long context in between.

Instruction (what to do, in one sentence).

[Long context / data / examples]

Reminder: (one-sentence restatement of the instruction).

This is especially helpful for extraction tasks where you are asking the model to pull specific fields out of a long document. Without the reminder, the model often drifts into summarizing or paraphrasing by the time it reaches the end of the context.

Related to structural discipline: Markdown inside your prompt helps the model distinguish between instructions, context, and data. Models have seen enough Markdown during pre-training that they respect its semantics.

ElementSyntaxBest use
Heading## SectionDefining distinct blocks (Task, Context, Constraints)
Bold**Text**Non-negotiable rules
Code block```Isolating data or “don’t touch” text
Blockquote> TextFew-shot example inputs or outputs
Lists1. or *Step-by-step instructions or checklist-style requirements

When in doubt, over-structure rather than under-structure. The cost of extra Markdown is a handful of tokens; the benefit is often a dramatically cleaner response.

Controlling Behavior: Temperature and Friends

Beyond the prompt itself, every LLM exposes a handful of generation parameters that shape the output. In AI Studio these live in the right-hand panel.

Temperature (usually 0.0–1.0, sometimes up to 2.0) controls how random the model’s sampling is. Low values force the model to pick the most likely token at every step (deterministic, factual); high values encourage exploration (creative, diverse).

RangeUse caseEffect
0.0–0.2Factual Q&A, data extraction, code generationHighly deterministic — the same prompt produces nearly identical outputs
0.3–0.5General question answering, summarizationBalanced — mostly consistent with small variation
0.6–0.8Creative writing, brainstorming, ideationNoticeably diverse outputs run to run
0.9–1.0+Maximum creativity experimentsUnpredictable; can drift into incoherence

Start at 0.2 for anything where correctness matters, and raise the temperature only when you specifically want variety.

The remaining parameters matter less often, but are worth knowing about:

Even the best prompt cannot cure hallucination on topics the model has no training data for — recent events, private documents, or niche facts. Grounding is the general name for giving the model a real source of truth to consult alongside the prompt. In AI Studio, you can enable Gemini’s built-in Google Search grounding with a single toggle, and Gemini will quietly issue searches, read the retrieved snippets, and cite its sources in the response.

Chapters 8 through 11 of this book teach how to build grounding pipelines yourself using retrieval-augmented generation (RAG) over documents, databases, and images. AI Studio’s Google Search toggle is the easiest possible version of the same idea — the model retrieves, the model generates, but the retrieval happens server-side rather than in your own code. Treat it as a preview of what you will build from scratch later.

Hands-On Exercise in Google AI Studio

Open https://aistudio.google.com/ (a Google account is all you need — no API key required for the playground itself) and work through these five steps:

  1. Zero-shot. Pick a model — Gemini 2.5 Flash is a good default — and type a plain question. Note how long the response is and how confident the tone feels.

  2. Add a system prompt. Set a Persona in the system-instruction field (e.g., “You are a skeptical data analyst who always asks for supporting evidence”) and rerun the same question. Notice how the response shifts.

  3. Few-shot. Add two or three examples of input-output pairs at the top of your user prompt. Rerun. Does the model follow your format?

  4. Temperature slider. Move the temperature from 0.1 to 0.9 and rerun a creative prompt (e.g., “Write one tagline for an electric-bike startup”). You should see the variance in outputs change dramatically.

  5. Search grounding. Toggle on Google Search grounding and ask a question about an event from the last month. The response will include citation links — click through to verify the claims actually appear in the sources.

Once the patterns feel natural in AI Studio, move to the Python notebooks in Chapter 5 (LLM Providers and Fallback) and onward. Every system prompt, every few-shot example, every temperature value you set in the playground maps directly to an argument in llm_cascade’s generate() method.

Key Takeaways

Exercises

Easy: Take one of your own past prompts from ChatGPT or Gemini and rewrite it using the PICFAT framework. Run both versions in AI Studio and compare the responses.

Easy: Run the same creative prompt (“Write a short tagline for [your company]”) three times at temperatures 0.0, 0.5, and 1.0. Save the outputs and describe in a short paragraph what changes between them.

Medium: Construct a few-shot prompt for a classification task with at least three examples, then deliberately flip the labels in those examples (e.g., swap “positive” and “negative”). Observe whether the model still learns the task from your format, or whether it learns the inverted labels. Relate your finding to Wei et al. (2023).

Challenge: Write two prompts for the same task — one using PICFAT for a standard model and one using the Role/Task/Background/Constraints/Style/Output framework for a reasoning model. Run each against Gemini 2.5 Flash and Gemini 2.5 Pro (with thinking enabled) in AI Studio. Where are the outputs different in kind, not just in quality?

References and Further Reading