📝 LLM & AI

Pydantic v2: Data Validation in Python

Author

Pyland

📅

Published

30.06.2026

⏱️

Reading time

1 min

👁️

Views

🌿

Level

Medium

#basics #llm

Pydantic is a library for data validation using type annotations. Version 2 was rewritten in Rust and runs 5–50x faster than version 1. It is the de-facto standard in FastAPI, LangChain, and LLM applications.

Installation

uv add pydantic

BaseModel — the base model

from pydantic import BaseModel

class TextAnalysis(BaseModel):
    sentiment: str
    score: float
    keywords: list[str]
    language: str

# Creating from a dictionary
data = {"sentiment": "positive", "score": 0.9, "keywords": ["python"], "language": "ru"}
result = TextAnalysis.model_validate(data)

print(result.sentiment)   # positive
print(result.score)       # 0.9
print(result.keywords)    # ['python']

model_validate() — parsing from a dict

import json

raw_json = '{"sentiment": "negative", "score": 0.2, "keywords": [], "language": "en"}'
data = json.loads(raw_json)
result = TextAnalysis.model_validate(data)

ValidationError — invalid data

from pydantic import ValidationError

try:
    bad = TextAnalysis.model_validate({"sentiment": "ok"})  # missing score and keywords
except ValidationError as e:
    print(e.error_count())   # 3
    for err in e.errors():
        print(err["loc"], err["msg"])

Nested models

class SentimentResult(BaseModel):
    label: str       # positive / negative / neutral
    confidence: float

class TextAnalysis(BaseModel):
    sentiment: SentimentResult
    keywords: list[str]
    language: str
    word_count: int

data = {
    "sentiment": {"label": "positive", "confidence": 0.87},
    "keywords": ["python", "api"],
    "language": "ru",
    "word_count": 150
}
result = TextAnalysis.model_validate(data)
print(result.sentiment.label)       # positive
print(result.sentiment.confidence)  # 0.87

Field() — constraints and descriptions

from pydantic import BaseModel, Field

class TextAnalysis(BaseModel):
    sentiment: str = Field(description="positive / negative / neutral")
    score: float = Field(ge=0.0, le=1.0, description="Confidence from 0 to 1")
    keywords: list[str] = Field(max_length=10, description="Keywords")
    language: str = Field(pattern=r"^[a-z]{2}$", description="ISO 639-1 language code")

model_dump() — back to dict

result = TextAnalysis.model_validate(data)
d = result.model_dump()        # dict
j = result.model_dump_json()   # JSON string

Why Pydantic in LLM applications

Claude returns plain text. To get structured data, ask it to respond in JSON and validate the result with Pydantic:

raw = response.content[0].text.strip()
data = json.loads(raw)
result = TextAnalysis.model_validate(data)
# Now result is a typed object with validated fields

Pydantic v2: Data Validation in Python

Installation

BaseModel — the base model

model_validate() — parsing from a dict

ValidationError — invalid data

Nested models

Field() — constraints and descriptions

model_dump() — back to dict

Why Pydantic in LLM applications

Your reaction to the article

asyncio in Python: Asynchronous Programming

Document Chunking Strategies for RAG

💬 Comments (0)

No comments yet

Similar articles

What is an ORM

httpx: A Modern HTTP Client for Python

AI Agents: ReAct Loop and Autonomous Actions

Did you like the article?

Similar articles

30.06

httpx: A Modern HTTP Client for Python

30.06

AI Agents: ReAct Loop and Autonomous Actions

← Previous article

asyncio in Python: Asynchronous Programming

📝 30.06.2026

Next article →

Document Chunking Strategies for RAG

📝 30.06.2026