Master zero-shot, few-shot, chain-of-thought, and other techniques that dramatically improve output quality.
Zero-shot prompting means asking the model directly without providing examples. You describe what you want, and the model generates a response based on its training. It works well for straightforward tasks but can struggle with complex, nuanced, or specialized requests.
When Zero-Shot Works Well:
When Zero-Shot Fails:
Extract the product name and price from this text: "Buy the TechPro X5 at $299.99!"
Extract product name and price.
Format: JSON
Example:
"Get the UltraBook 3000 for $1299"
→ {"name": "UltraBook 3000",
"price": 1299}
Now extract from:
"Buy the TechPro X5 at $299.99!"
Always try zero-shot first. If results aren't good enough, then add examples. This lets you understand what the model can do naturally vs. what needs guidance.
Few-shot prompting means providing one or more examples of the task you want done. The model learns the pattern from your examples and applies it to new inputs. This is powerful for teaching the model your specific style, format, or logic.
Good for straightforward tasks where one example is enough to show the pattern. Minimal tokens used.
The sweet spot for most use cases. Shows variety and teaches the pattern robustly.
Use when the task is complex, or you need very consistent output across edge cases.
Beyond 5-10 examples, improvements plateau and token costs rise. More examples ≠ better output.
Few-Shot Example: Sentiment Classification
# System: You are a sentiment classifier. Classify sentiment as: Positive, Negative, or Neutral. # Examples (few-shot): Text: "I love this product!" Sentiment: Positive Text: "This is okay, nothing special." Sentiment: Neutral Text: "Terrible experience, never again." Sentiment: Negative # Now classify: Text: "The movie was pretty good actually." Sentiment: ?
Few-Shot in Python API Calls
import anthropic client = anthropic.Anthropic() messages = [ { "role": "user", "content": """Classify sentiment: Positive, Negative, Neutral. Example: "Love it!" → Positive "It's fine" → Neutral "Awful" → Negative Now classify: "Pretty good!" Answer in one word only.""" } ] response = client.messages.create( model="claude-sonnet-4-5-20250929", max_tokens=100, messages=messages ) print(response.content[0].text)
Diversity matters. Your examples should cover edge cases and variations. Don't give 3 examples that are all nearly identical — show the model the range of inputs it might see.
Chain-of-thought (CoT) is the magic phrase: "Let's think step by step." This simple instruction tells the model to show its reasoning before giving the final answer. It dramatically improves performance on math, logic, and reasoning tasks.
Q: If a train travels at 60 mph for 2.5 hours, how far does it go? A: 150 miles
Q: If a train travels at 60 mph for 2.5 hours, how far does it go? Let's think step by step: 1. Distance = Speed × Time 2. Speed = 60 mph 3. Time = 2.5 hours 4. 60 × 2.5 = 150 miles A: 150 miles
Without CoT, the model might make arithmetic errors by rushing. With CoT, it forces the model to decompose the problem and show each step, making errors easier to catch.
Why CoT Works:
Q: Alice has 5 apples. She buys 3 more. Then she gives half to Bob. How many does Alice have left? Let's solve this step by step: Step 1: ... Step 2: ... Step 3: Final answer: ...
CoT is best for reasoning, math, and logic. For simple retrieval or classification, it adds unnecessary tokens. Use it when the task requires thinking, not for every task.
Self-consistency is a technique where you run the same prompt multiple times with different CoT paths, then pick the most common answer. It's like asking the model the question several ways and letting the majority vote decide. This dramatically improves reasoning accuracy.
A single juror might make a mistake. But if 10 jurors independently vote, the majority decision is usually correct. Self-consistency applies the same principle — run multiple inference passes and trust the consensus.
How Self-Consistency Works:
from collections import Counter import anthropic client = anthropic.Anthropic() def self_consistency_reasoning(question, num_runs=5): """Run reasoning multiple times and pick majority answer.""" answers = [] for _ in range(num_runs): response = client.messages.create( model="claude-sonnet-4-5-20250929", max_tokens=500, temperature=1.0, # Higher temp = more variation messages=[{ "role": "user", "content": f"{question}\nLet's think step by step, then give final answer." }] ) text = response.content[0].text # Extract final answer (simple approach; in production, parse more carefully) answers.append(text) # Majority voting on answers most_common = Counter(answers).most_common(1)[0][0] return most_common answer = self_consistency_reasoning("5 + 3 * 2 = ?", num_runs=5) print(f"Most consistent answer: {answer}")
Use self-consistency for high-stakes reasoning tasks where accuracy matters: math problems, logic puzzles, code generation, complex analysis. Skip it for simple tasks or when you're rate-limited.
Role prompting means assigning the model an identity or expertise before asking your question. "You are a senior surgeon" produces different answers than "You are a medical student." The role shapes knowledge depth, terminology, and reasoning style.
Interactive Demo: Same Question, Different Roles
Example: Same Content, Different Expertise
"You are a junior Python developer with 1 year of experience." Q: How do I make my code faster? A: Use for loops instead of while loops. Or maybe try list comprehensions?
"You are a senior Python architect at a Fortune 500 company." Q: How do I make my code faster? A: Profile with cProfile to identify bottlenecks. Consider algorithmic complexity. Evaluate caching, vectorization, or compiled extensions (Cython, numba).
More experienced roles produce deeper reasoning, better terminology, and higher quality output. Don't just say "developer" — say "senior backend engineer with 10 years of AWS and Kubernetes experience." Specificity in the role improves specificity in the response.
The real power comes from combining techniques. You might use role prompting + few-shot + CoT together. The key is knowing when each technique helps.
Decision Guide: Which Techniques to Use?
Zero-shot usually works. Add few-shot if needed for edge cases.
Always use CoT. Add self-consistency for high stakes.
Use strong role prompting + few-shot to set tone and style.
Role + few-shot + structured format (from Topic 4) = best results.
Real Example: Combined Techniques
# 1. ROLE PROMPTING You are a senior data scientist evaluating business proposals. # 2. FEW-SHOT EXAMPLES Example proposal: "Social media app for dog lovers" Assessment: High market saturation, unclear monetization. RISK: High Example proposal: "AI tool for invoice processing" Assessment: Growing demand, B2B SaaS model proven. RISK: Medium # 3. CHAIN-OF-THOUGHT INSTRUCTION Now evaluate this proposal. Think through: 1. Market size and competition 2. Technical feasibility 3. Monetization clarity 4. Top 3 risks Proposal: "Healthcare AI startup for automated diagnosis" # 4. OUTPUT FORMAT Provide your assessment in JSON: { "summary": "...", "feasibility": "High/Medium/Low", "risks": ["risk1", "risk2", "risk3"] }
Start simple (zero-shot), then layer in techniques as needed: add few-shot, add CoT, add role. Each layer adds tokens and complexity, so use only what's necessary. Measure results.
1. What is zero-shot prompting?
2. When is chain-of-thought (CoT) most valuable?
3. What does self-consistency do?
4. How does role prompting affect the model's responses?
Here's what you've learned:
Zero-shot is your baseline — ask directly, no examples. Few-shot teaches the model with examples (1-5 is usually best). Chain-of-thought forces step-by-step reasoning, especially powerful for math and logic. Self-consistency runs multiple times and votes on the answer, dramatically improving accuracy. Role prompting assigns expertise, shaping output depth and style. Combining techniques multiplies their power — but keep it simple.
Next up → Topic 4: Advanced Prompting
Learn XML tags, structured output, negative prompting, and production-grade patterns.