Get reliable JSON, tables, and typed data from LLMs — using schemas, validation, and parsing strategies.
LLMs naturally produce prose — flowing, conversational text. But real-world applications need structured data: JSON for APIs, rows for databases, typed objects for application logic. The gap between free-text output and machine-readable data is where structured output techniques come in.
Without structured output, you're stuck writing fragile string-parsing code that breaks whenever the model changes its phrasing. With the right approach, you get predictable, validated data every time.
Return data as valid JSON objects that your application can parse and use directly without string manipulation.
Pull structured fields (names, dates, amounts) from unstructured text like emails, documents, or reports.
Generate output that matches your API schemas so LLM responses can flow directly into your system.
Transform natural language into structured records ready for database insertion with proper types and constraints.
Think of structured output as giving the LLM a form to fill out instead of a blank page. A blank page invites rambling prose. A form with labeled fields, dropdowns, and checkboxes gets you exactly the data you need, in exactly the format you expect.
Different LLM providers offer different mechanisms for requesting structured output. Some have native JSON modes built into their APIs, while others rely on careful prompting. Here's how the major providers handle it:
OpenAI — native JSON mode via response_format:
from openai import OpenAI client = OpenAI() response = client.chat.completions.create( model="gpt-4o", response_format={"type": "json_object"}, messages=[ {"role": "system", "content": "Return JSON with keys: title, summary, rating"}, {"role": "user", "content": "Review the movie Inception"} ] ) # response.choices[0].message.content is guaranteed valid JSON import json data = json.loads(response.choices[0].message.content) print(data["title"]) # "Inception"
Anthropic — structured output via system prompts + XML tags or tool use:
import anthropic import json client = anthropic.Anthropic() message = client.messages.create( model="claude-sonnet-4-5-20250929", max_tokens=1024, system="""Return your response as a JSON object inside <json> tags. The JSON must have these exact keys: title, summary, rating. Example: <json>{"title": "...", "summary": "...", "rating": 8.5}</json>""", messages=[ {"role": "user", "content": "Review the movie Inception"} ] ) # Extract JSON from XML tags import re match = re.search(r'<json>([\s\S]*?)</json>', message.content[0].text) data = json.loads(match.group(1))
Google Gemini — schema-based generation with response MIME type:
import google.generativeai as genai model = genai.GenerativeModel("gemini-1.5-pro") response = model.generate_content( "Review the movie Inception", generation_config=genai.types.GenerationConfig( response_mime_type="application/json", response_schema={ "type": "object", "properties": { "title": {"type": "string"}, "summary": {"type": "string"}, "rating": {"type": "number"} }, "required": ["title", "summary", "rating"] } ) )
Not all models have native JSON mode — but you can achieve reliable JSON from any model with the right prompting. The key is to be explicit about the expected structure, provide an example, and use delimiters (like XML tags) so you can extract the JSON programmatically.
The most reliable approach to structured output is schema-driven generation — defining the exact shape of your data with types, constraints, and descriptions, then letting a library enforce that schema against the LLM's output. This is where tools like Pydantic and Instructor shine.
First, define your data model with Pydantic:
from pydantic import BaseModel, Field class MovieReview(BaseModel): title: str = Field(description="Movie title") rating: float = Field(ge=1, le=10, description="Rating from 1-10") sentiment: str = Field(description="positive, negative, or neutral") key_themes: list[str] = Field(description="Main themes") summary: str = Field(max_length=200)
Then use the Instructor library to get typed, validated responses automatically:
# Install: pip install instructor openai import instructor from openai import OpenAI # Patch the OpenAI client with Instructor client = instructor.from_openai(OpenAI()) # Get a typed, validated response — no manual parsing! review = client.chat.completions.create( model="gpt-4o", response_model=MovieReview, messages=[ {"role": "user", "content": "Review the movie Inception"} ] ) # review is a fully typed MovieReview object print(review.title) # "Inception" print(review.rating) # 9.2 print(review.sentiment) # "positive" print(review.key_themes) # ["dreams", "reality", "loss"] # Validation is automatic — rating must be 1-10, summary ≤ 200 chars # If the LLM returns invalid data, Instructor retries automatically
The Pydantic model serves triple duty: it documents the expected format, constrains the LLM's output, and validates the result. If the LLM returns a rating of 15, Pydantic catches the violation and Instructor automatically retries with corrective feedback.
In production, LLM responses aren't always perfectly formatted. The model might wrap JSON in markdown code blocks, include explanatory text before the JSON, or produce slightly malformed output. A robust parsing strategy handles all these cases gracefully.
Here's a battle-tested parsing function that handles the most common edge cases:
import json import re def extract_json(text: str) -> dict: """Extract JSON from LLM response, handling markdown code blocks.""" # Try direct parse first try: return json.loads(text) except json.JSONDecodeError: pass # Try extracting from markdown code block match = re.search(r'```(?:json)?\s*([\s\S]*?)```', text) if match: return json.loads(match.group(1).strip()) # Try finding JSON-like substring match = re.search(r'\{[\s\S]*\}', text) if match: return json.loads(match.group(0)) raise ValueError("No valid JSON found in response")
For even more resilience, add a retry-with-feedback loop that sends parsing errors back to the model:
def get_json_with_retry(client, messages, max_retries=3): """Get valid JSON from LLM with automatic retry on parse failure.""" for attempt in range(max_retries): response = client.messages.create( model="claude-sonnet-4-5-20250929", max_tokens=1024, messages=messages ) text = response.content[0].text try: return extract_json(text) except (json.JSONDecodeError, ValueError) as e: # Send the error back to the model for correction messages.append({"role": "assistant", "content": text}) messages.append({ "role": "user", "content": f"That response failed to parse: {e}\n" f"Please return ONLY valid JSON, no other text." }) raise ValueError(f"Failed to get valid JSON after {max_retries} attempts")
Always validate parsed output against your expected schema — LLMs can produce valid JSON that doesn't
match your structure. A response like {"result": "success"} is valid JSON but useless
if you expected {"title": "...", "rating": 9}. Use Pydantic or JSON Schema validation
to catch structural mismatches, not just syntax errors.
Streaming LLM responses is great for UX — users see text appearing in real time. But streaming
structured data introduces a challenge: you can't parse incomplete JSON.
A partial response like {"title": "Incep will throw a parse error.
There are two main strategies for streaming structured output:
Each line is a complete JSON object. Parse line-by-line as they arrive. Ideal for lists of items or log-style output.
Use libraries that can extract completed fields from incomplete JSON, updating the UI as each field becomes available.
import instructor from openai import OpenAI from pydantic import BaseModel class MovieReview(BaseModel): title: str rating: float summary: str client = instructor.from_openai(OpenAI()) # Stream partial objects — fields populate as they're generated review_stream = client.chat.completions.create_partial( model="gpt-4o", response_model=MovieReview, messages=[ {"role": "user", "content": "Review the movie Inception"} ] ) for partial_review in review_stream: # partial_review has fields filled in as they stream print(f"Title: {partial_review.title or '...'}") print(f"Rating: {partial_review.rating or '...'}") print(f"Summary: {partial_review.summary or '...'}") print("---")
Streaming structured output is most valuable for real-time UIs where users watch results populate (dashboards, search results, data tables) and for large outputs where waiting for the complete response would cause unacceptable delays. For backend pipelines, batch parsing after completion is simpler and preferred.
Follow these principles to build reliable structured output pipelines that work in production.
Explicit schemas get better results than hoping for the right format. Use Pydantic models, JSON Schema, or at minimum a clear example in your prompt.
Always validate output against your schema before using it. Valid JSON does not mean correct structure. Check types, required fields, and value constraints.
If parsing fails, send the error back to the model for correction. Most models self-correct reliably when shown their mistake and the expected format.
Begin with flat structures; add nesting only when needed. Deeply nested schemas are harder for models to get right and harder for you to validate.
1. What's the most reliable way to get JSON output from an LLM?
2. Why should you validate LLM output even when using JSON mode?
3. What should you do when JSON parsing fails on an LLM response?
Here's what you've learned:
Structured output bridges the gap between LLM prose and machine-readable data. Use JSON mode or schema-based generation (Pydantic + Instructor) for the most reliable results. Build robust parsers that handle markdown code blocks and malformed output. Always validate against your schema — valid JSON doesn't mean correct structure. When parsing fails, retry with feedback rather than giving up immediately.
Next up → Topic 12: Evaluating Prompt Quality
You'll learn how to measure, score, and systematically improve your prompts with evaluation frameworks and metrics.