Topic 11 — Structured Output & Parsing

1

Why Structured Output Matters

LLMs naturally produce prose — flowing, conversational text. But real-world applications need structured data: JSON for APIs, rows for databases, typed objects for application logic. The gap between free-text output and machine-readable data is where structured output techniques come in.

Without structured output, you're stuck writing fragile string-parsing code that breaks whenever the model changes its phrasing. With the right approach, you get predictable, validated data every time.

📦

JSON Responses

Return data as valid JSON objects that your application can parse and use directly without string manipulation.

🔍

Data Extraction

Pull structured fields (names, dates, amounts) from unstructured text like emails, documents, or reports.

🔗

API Integration

Generate output that matches your API schemas so LLM responses can flow directly into your system.

🗄️

Database Population

Transform natural language into structured records ready for database insertion with proper types and constraints.

💡

Analogy: Form vs. Blank Page

Think of structured output as giving the LLM a form to fill out instead of a blank page. A blank page invites rambling prose. A form with labeled fields, dropdowns, and checkboxes gets you exactly the data you need, in exactly the format you expect.

2

JSON Mode & Response Format

Different LLM providers offer different mechanisms for requesting structured output. Some have native JSON modes built into their APIs, while others rely on careful prompting. Here's how the major providers handle it:

OpenAI — native JSON mode via response_format:

Python

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o",
    response_format={"type": "json_object"},
    messages=[
        {"role": "system", "content": "Return JSON with keys: title, summary, rating"},
        {"role": "user", "content": "Review the movie Inception"}
    ]
)

# response.choices[0].message.content is guaranteed valid JSON
import json
data = json.loads(response.choices[0].message.content)
print(data["title"])  # "Inception"

Anthropic — structured output via system prompts + XML tags or tool use:

Python

import anthropic
import json

client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    system="""Return your response as a JSON object inside <json> tags.
The JSON must have these exact keys: title, summary, rating.
Example: <json>{"title": "...", "summary": "...", "rating": 8.5}</json>""",
    messages=[
        {"role": "user", "content": "Review the movie Inception"}
    ]
)

# Extract JSON from XML tags
import re
match = re.search(r'<json>([\s\S]*?)</json>', message.content[0].text)
data = json.loads(match.group(1))

Google Gemini — schema-based generation with response MIME type:

Python

import google.generativeai as genai

model = genai.GenerativeModel("gemini-1.5-pro")

response = model.generate_content(
    "Review the movie Inception",
    generation_config=genai.types.GenerationConfig(
        response_mime_type="application/json",
        response_schema={
            "type": "object",
            "properties": {
                "title": {"type": "string"},
                "summary": {"type": "string"},
                "rating": {"type": "number"}
            },
            "required": ["title", "summary", "rating"]
        }
    )
)

✅

Key Insight

Not all models have native JSON mode — but you can achieve reliable JSON from any model with the right prompting. The key is to be explicit about the expected structure, provide an example, and use delimiters (like XML tags) so you can extract the JSON programmatically.

3

Schema-Driven Generation

The most reliable approach to structured output is schema-driven generation — defining the exact shape of your data with types, constraints, and descriptions, then letting a library enforce that schema against the LLM's output. This is where tools like Pydantic and Instructor shine.

First, define your data model with Pydantic:

Python

from pydantic import BaseModel, Field

class MovieReview(BaseModel):
    title: str = Field(description="Movie title")
    rating: float = Field(ge=1, le=10, description="Rating from 1-10")
    sentiment: str = Field(description="positive, negative, or neutral")
    key_themes: list[str] = Field(description="Main themes")
    summary: str = Field(max_length=200)

Then use the Instructor library to get typed, validated responses automatically:

Python

# Install: pip install instructor openai
import instructor
from openai import OpenAI

# Patch the OpenAI client with Instructor
client = instructor.from_openai(OpenAI())

# Get a typed, validated response — no manual parsing!
review = client.chat.completions.create(
    model="gpt-4o",
    response_model=MovieReview,
    messages=[
        {"role": "user", "content": "Review the movie Inception"}
    ]
)

# review is a fully typed MovieReview object
print(review.title)       # "Inception"
print(review.rating)      # 9.2
print(review.sentiment)   # "positive"
print(review.key_themes)  # ["dreams", "reality", "loss"]

# Validation is automatic — rating must be 1-10, summary ≤ 200 chars
# If the LLM returns invalid data, Instructor retries automatically

✅

Why Schema-Driven is Best

The Pydantic model serves triple duty: it documents the expected format, constrains the LLM's output, and validates the result. If the LLM returns a rating of 15, Pydantic catches the violation and Instructor automatically retries with corrective feedback.

4

Parsing Strategies

In production, LLM responses aren't always perfectly formatted. The model might wrap JSON in markdown code blocks, include explanatory text before the JSON, or produce slightly malformed output. A robust parsing strategy handles all these cases gracefully.

Here's a battle-tested parsing function that handles the most common edge cases:

Python

import json
import re

def extract_json(text: str) -> dict:
    """Extract JSON from LLM response, handling markdown code blocks."""
    # Try direct parse first
    try:
        return json.loads(text)
    except json.JSONDecodeError:
        pass

    # Try extracting from markdown code block
    match = re.search(r'```(?:json)?\s*([\s\S]*?)```', text)
    if match:
        return json.loads(match.group(1).strip())

    # Try finding JSON-like substring
    match = re.search(r'\{[\s\S]*\}', text)
    if match:
        return json.loads(match.group(0))

    raise ValueError("No valid JSON found in response")

For even more resilience, add a retry-with-feedback loop that sends parsing errors back to the model:

Python

def get_json_with_retry(client, messages, max_retries=3):
    """Get valid JSON from LLM with automatic retry on parse failure."""
    for attempt in range(max_retries):
        response = client.messages.create(
            model="claude-sonnet-4-5-20250929",
            max_tokens=1024,
            messages=messages
        )
        text = response.content[0].text

        try:
            return extract_json(text)
        except (json.JSONDecodeError, ValueError) as e:
            # Send the error back to the model for correction
            messages.append({"role": "assistant", "content": text})
            messages.append({
                "role": "user",
                "content": f"That response failed to parse: {e}\n"
                           f"Please return ONLY valid JSON, no other text."
            })

    raise ValueError(f"Failed to get valid JSON after {max_retries} attempts")

⚠️

Always Validate Against Your Schema

Always validate parsed output against your expected schema — LLMs can produce valid JSON that doesn't match your structure. A response like {"result": "success"} is valid JSON but useless if you expected {"title": "...", "rating": 9}. Use Pydantic or JSON Schema validation to catch structural mismatches, not just syntax errors.

5

Streaming Structured Output

Streaming LLM responses is great for UX — users see text appearing in real time. But streaming structured data introduces a challenge: you can't parse incomplete JSON. A partial response like {"title": "Incep will throw a parse error.

There are two main strategies for streaming structured output:

📝

Line-Delimited JSON (NDJSON)

Each line is a complete JSON object. Parse line-by-line as they arrive. Ideal for lists of items or log-style output.

🧩

Partial Object Parsing

Use libraries that can extract completed fields from incomplete JSON, updating the UI as each field becomes available.

Python

import instructor
from openai import OpenAI
from pydantic import BaseModel

class MovieReview(BaseModel):
    title: str
    rating: float
    summary: str

client = instructor.from_openai(OpenAI())

# Stream partial objects — fields populate as they're generated
review_stream = client.chat.completions.create_partial(
    model="gpt-4o",
    response_model=MovieReview,
    messages=[
        {"role": "user", "content": "Review the movie Inception"}
    ]
)

for partial_review in review_stream:
    # partial_review has fields filled in as they stream
    print(f"Title: {partial_review.title or '...'}")
    print(f"Rating: {partial_review.rating or '...'}")
    print(f"Summary: {partial_review.summary or '...'}")
    print("---")

✅

When Streaming Matters

Streaming structured output is most valuable for real-time UIs where users watch results populate (dashboards, search results, data tables) and for large outputs where waiting for the complete response would cause unacceptable delays. For backend pipelines, batch parsing after completion is simpler and preferred.

6

Best Practices

Follow these principles to build reliable structured output pipelines that work in production.

📐

Always Define a Schema

Explicit schemas get better results than hoping for the right format. Use Pydantic models, JSON Schema, or at minimum a clear example in your prompt.

🛡️

Validate, Don't Trust

Always validate output against your schema before using it. Valid JSON does not mean correct structure. Check types, required fields, and value constraints.

🔄

Retry with Feedback

If parsing fails, send the error back to the model for correction. Most models self-correct reliably when shown their mistake and the expected format.

🎯

Start Simple

Begin with flat structures; add nesting only when needed. Deeply nested schemas are harder for models to get right and harder for you to validate.

✓

Check Your Understanding

Quick Quiz — 3 Questions

1. What's the most reliable way to get JSON output from an LLM?

2. Why should you validate LLM output even when using JSON mode?

3. What should you do when JSON parsing fails on an LLM response?

✓

Topic 11 Summary

Here's what you've learned:

Structured output bridges the gap between LLM prose and machine-readable data. Use JSON mode or schema-based generation (Pydantic + Instructor) for the most reliable results. Build robust parsers that handle markdown code blocks and malformed output. Always validate against your schema — valid JSON doesn't mean correct structure. When parsing fails, retry with feedback rather than giving up immediately.

Next up → Topic 12: Evaluating Prompt Quality
You'll learn how to measure, score, and systematically improve your prompts with evaluation frameworks and metrics.