RAG (Retrieval-Augmented Generation) is a pattern for working with your own documents. Instead of fine-tuning the model, you find the relevant fragments and pass them in as context.
The problem: LLMs hallucinate
Claude was trained up to a certain date and knows nothing about your documents. When you ask about them, the model generates plausible-sounding but fabricated text.
RAG solves this
User question
↓
Vector search over the document database
↓
Relevant fragments (chunks) retrieved
↓
Fragments → Claude's system prompt
↓
Claude answers based on the real text
Implementation
import chromadb
import anthropic
client_ai = anthropic.Anthropic()
chroma = chromadb.PersistentClient(path="./chroma_db")
collection = chroma.get_or_create_collection("docs")
SYSTEM_BASE = "You are an assistant. Answer only based on the provided documents."
def search(query: str, n: int = 3) -> list[dict]:
results = collection.query(
query_texts=[query],
n_results=n,
include=["documents", "metadatas", "distances"]
)
return [
{"text": doc, "source": meta["source"], "distance": dist}
for doc, meta, dist in zip(
results["documents"][0],
results["metadatas"][0],
results["distances"][0]
)
if dist < 0.5 # relevance filter
]
def build_context(chunks: list[dict]) -> str:
if not chunks:
return ""
parts = []
for i, chunk in enumerate(chunks, 1):
relevance = 1 - chunk["distance"]
parts.append(
f"[Fragment {i} | Source: {chunk['source']} | Relevance: {relevance:.0%}]\n"
f"{chunk['text']}"
)
return "\n\n".join(parts)
def rag_chat(question: str, history: list[dict]) -> str:
chunks = search(question)
context = build_context(chunks)
system = SYSTEM_BASE
if context:
system += f"\n\nDocuments:\n{context}"
else:
system += "\n\nNo documents found. Let the user know."
history.append({"role": "user", "content": question})
response = client_ai.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=system,
messages=history,
)
answer = response.content[0].text
history.append({"role": "assistant", "content": answer})
return answer
Key RAG decisions
| Decision | Impact |
|---|---|
| chunk_size | Search precision |
| n_results | Amount of context |
| distance threshold | Filtering out irrelevant results |
| System prompt | Behavior when no data is found |
💬 Comments (0)
No comments yet
Be the first to share your opinion about this article!