Topic 6 — Hands-On: Building with LLM APIs

1

Setting Up Your Environment

Before you can use Claude or OpenAI APIs, you need to set up your local environment. This is a one-time setup that takes 5 minutes.

Step 1: Install the SDKs

Bash — Install SDKs

# Install Anthropic SDK (Claude)
pip install anthropic

# Install OpenAI SDK
pip install openai

Step 2: Get API Keys

Claude: Visit console.anthropic.com, sign up, create an API key
OpenAI: Visit platform.openai.com, sign up, create an API key

Step 3: Set Environment Variables

Bash — Environment Setup

# On macOS/Linux: Add to ~/.bashrc or ~/.zshrc
export ANTHROPIC_API_KEY="your-key-here"
export OPENAI_API_KEY="your-key-here"

# On Windows: Use System Variables or:
set ANTHROPIC_API_KEY="your-key-here"
set OPENAI_API_KEY="your-key-here"

# Then reload your shell
source ~/.bashrc

✅

Never Hardcode API Keys

Always use environment variables. If you hardcode a key and push to GitHub, attackers can use your key before you realize it's exposed. Use environment variables, .env files, or secrets managers.

2

Anthropic SDK Deep Dive

The Anthropic SDK makes calling Claude APIs simple and intuitive. Let's cover the core methods and patterns you'll use every day.

Basic API Call

Python — Basic Call

import anthropic

client = anthropic.Anthropic()  # Uses ANTHROPIC_API_KEY env var

response = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    system="You are a helpful assistant.",
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

print(response.content[0].text)

Multi-Turn Conversation

Python — Multi-Turn Conversation

import anthropic

client = anthropic.Anthropic()

conversation_history = []

def chat(user_message):
    """Add user message and get response."""
    conversation_history.append({
        "role": "user",
        "content": user_message
    })

    response = client.messages.create(
        model="claude-sonnet-4-5-20250929",
        max_tokens=1024,
        system="You are a helpful code assistant.",
        messages=conversation_history
    )

    assistant_message = response.content[0].text
    conversation_history.append({
        "role": "assistant",
        "content": assistant_message
    })

    return assistant_message

# Conversation
print(chat("What's a closure in Python?"))
print(chat("Can you give me an example?"))
print(chat("How is that different from a class?"))

Key Parameters:

📝

model

Which Claude version to use. Current best: "claude-sonnet-4-5-20250929"

📊

max_tokens

Maximum tokens in the response. Higher = longer responses, higher cost

🎭

system

System prompt that defines the model's behavior

💬

messages

List of user/assistant messages, the conversation history

💡

Temperature & Other Parameters

temperature (0-1): Controls randomness. 0 = deterministic, 1 = creative. top_p: Nucleus sampling. top_k: Limits vocabulary. For most tasks: temperature=0 (deterministic). For creative tasks: temperature=0.7 (balanced).

3

OpenAI SDK Comparison

The OpenAI SDK has a similar structure to Anthropic's. Learn both to understand the patterns. They map directly to each other.

OpenAI Basic Call

Python — OpenAI Basic Call

from openai import OpenAI

client = OpenAI()  # Uses OPENAI_API_KEY env var

response = client.chat.completions.create(
    model="gpt-4o",
    max_tokens=1024,
    system="You are a helpful assistant.",
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

print(response.choices[0].message.content)

Side-by-Side Comparison

Anthropic (Claude)

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-5...",
    max_tokens=1024,
    system="...",
    messages=[...]
)

text = response.content[0].text

OpenAI (GPT-4)

from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o",
    max_tokens=1024,
    system="...",
    messages=[...]
)

text = response.choices[0].message.content

✅

When to Use Each

Claude: Best for long-context tasks, reasoning, code analysis. GPT-4: Best for creative writing, multimodal (image) tasks, speed. Most teams use both — Claude for heavy thinking, GPT-4 for quick tasks.

4

Streaming Responses

Streaming is crucial for UX. Instead of waiting for the full response, you get tokens as they arrive. Users see text appearing in real-time.

Streaming with Anthropic

Python — Anthropic Streaming

import anthropic

client = anthropic.Anthropic()

with client.messages.stream(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Write a haiku"}
    ]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)  # Print without newline

Streaming with OpenAI

Python — OpenAI Streaming

from openai import OpenAI

client = OpenAI()

stream = client.chat.completions.create(
    model="gpt-4o",
    max_tokens=1024,
    stream=True,
    messages=[
        {"role": "user", "content": "Write a haiku"}
    ]
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

💡

When to Stream

Use streaming for: chat interfaces, real-time UIs, long responses (code generation). Don't stream for: batch processing, APIs where you need the full response at once.

5

Error Handling & Retries

APIs fail. Rate limits, timeouts, network errors. Production code must handle these gracefully.

Production-Grade Error Handling

Python — Error Handling with Retries

import time
import anthropic

client = anthropic.Anthropic()

def call_claude_with_retry(prompt, max_retries=3):
    """Call Claude with exponential backoff on failure."""
    for attempt in range(max_retries):
        try:
            response = client.messages.create(
                model="claude-sonnet-4-5-20250929",
                max_tokens=1024,
                messages=[{
                    "role": "user",
                    "content": prompt
                }]
            )
            return response.content[0].text

        except anthropic.RateLimitError as e:
            print(f"Rate limited on attempt {attempt + 1}")
            if attempt < max_retries - 1:
                wait_time = 2 ** attempt  # Exponential backoff
                print(f"Waiting {wait_time}s before retry...")
                time.sleep(wait_time)
            else:
                raise

        except anthropic.APIError as e:
            print(f"API error: {e.status_code} {e.message}")
            if attempt < max_retries - 1:
                wait_time = 2 ** attempt
                time.sleep(wait_time)
            else:
                raise

        except Exception as e:
            print(f"Unexpected error: {e}")
            raise

    return None

# Usage
try:
    result = call_claude_with_retry("Hello Claude")
    print(result)
except Exception as e:
    print(f"Failed after retries: {e}")

⚠️

Exponential Backoff is Critical

Don't retry immediately. Use exponential backoff: wait 1s, then 2s, then 4s. This gives the API time to recover and prevents cascading failures.

6

Cost Optimization

LLM APIs charge by tokens. Track usage to manage costs. A million tokens might cost $0.50 (Claude) or $2.00 (GPT-4). Small optimizations add up.

Tracking Token Usage

Python — Cost Tracking

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": "Hello"
    }]
)

# Token usage
input_tokens = response.usage.input_tokens
output_tokens = response.usage.output_tokens
total_tokens = input_tokens + output_tokens

# Cost estimation (Claude Sonnet pricing)
input_cost = (input_tokens / 1_000_000) * 3      # $3 per 1M input tokens
output_cost = (output_tokens / 1_000_000) * 15     # $15 per 1M output tokens
total_cost = input_cost + output_cost

print(f"Input: {input_tokens}, Output: {output_tokens}")
print(f"Cost: ${total_cost:.6f}")

Cost Optimization Strategies:

Use cheaper models: Claude 3 Haiku is 10x cheaper than Sonnet for simple tasks
Shorter system prompts: Every token costs money. Be concise.
Batch processing: For non-urgent tasks, use batch APIs (cheaper per token)
Cache context: Cache long documents to avoid re-processing
Monitor usage: Track API costs like you track cloud infrastructure

✅

Batch API

For tasks that don't need immediate results (overnight processing), Anthropic offers batch API at 50% discount. Great for data processing pipelines.

7

Mini Project: Terminal Chat App

Let's build a complete terminal chat application with conversation history, streaming, and error handling. This is production-grade code you can use as a template.

Python — Complete Chat App

import os
import json
import anthropic

client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

HISTORY_FILE = "chat_history.json"

def load_history():
    """Load conversation history from file."""
    if os.path.exists(HISTORY_FILE):
        with open(HISTORY_FILE, "r") as f:
            return json.load(f)
    return []

def save_history(history):
    """Save conversation history to file."""
    with open(HISTORY_FILE, "w") as f:
        json.dump(history, f, indent=2)

def chat(user_input, history):
    """Send message and stream response."""
    history.append({"role": "user", "content": user_input})

    print("\nAssistant: ", end="", flush=True)
    full_response = ""

    try:
        with client.messages.stream(
            model="claude-sonnet-4-5-20250929",
            max_tokens=1024,
            system="You are a helpful AI assistant.",
            messages=history
        ) as stream:
            for text in stream.text_stream:
                print(text, end="", flush=True)
                full_response += text
    except anthropic.APIError as e:
        print(f"\nError: {e}")
        return history

    print("\n")
    history.append({"role": "assistant", "content": full_response})
    return history

def main():
    """Main chat loop."""
    history = load_history()

    print("✨ Claude Terminal Chat (type 'quit' to exit, 'clear' to reset)")
    print("━" * 50)

    while True:
        user_input = input("\nYou: ").strip()

        if not user_input:
            continue
        if user_input.lower() == "quit":
            print("Goodbye!")
            break
        if user_input.lower() == "clear":
            history = []
            print("History cleared.")
            continue

        history = chat(user_input, history)
        save_history(history)

if __name__ == "__main__":
    main()

To Run:

Bash

# Set your API key
export ANTHROPIC_API_KEY="your-key"

# Run the app
python chat_app.py

✅

What You've Built

This chat app has: conversation persistence (saves to disk), streaming for real-time UX, error handling for API failures, and a clean CLI interface. This is a foundation you can extend for production use.

✓

Check Your Understanding

Quick Quiz — 4 Questions

1. Where should you store API keys?

2. Why use streaming for chat interfaces?

3. What's the best strategy for handling API rate limits?

4. How can you reduce LLM API costs?

✓

Topic 6 & Phase 1 Summary

Topic 6 covered: Environment setup, Anthropic SDK patterns, OpenAI SDK for comparison, streaming for UX, production-grade error handling with exponential backoff, cost optimization via token tracking, and a complete mini-project (terminal chat app).

Phase 1 Completed: You've journeyed from "How do LLMs work?" through structure, techniques, advanced patterns, iteration & debugging, and real hands-on API coding. You can now:

Write prompts that get consistent, high-quality results
Use advanced techniques (few-shot, CoT, self-consistency) strategically
Structure prompts with XML, JSON, and proper formatting
Iterate and debug prompts systematically
Call Claude and OpenAI APIs with streaming, error handling, and cost awareness
Build production-grade AI applications

Next Phase (Topics 7-20): Build AI Agents, RAG systems, multi-agent architectures, and domain-specific applications.
You've unlocked the ability to go beyond simple prompts into autonomous systems.