📝 LLM & AI

RAG: Chatting with Documents via Vector Search

P
Author
Pyland
📅
Published
30.06.2026
⏱️
Reading time
1 min
👁️
Views
88
📊
Level
Article

RAG (Retrieval-Augmented Generation) is a pattern for working with your own documents. Instead of fine-tuning the model, you find the relevant fragments and pass them in as context.

The problem: LLMs hallucinate

Claude was trained up to a certain date and knows nothing about your documents. When you ask about them, the model generates plausible-sounding but fabricated text.

RAG solves this

User question
      ↓
Vector search over the document database
      ↓
Relevant fragments (chunks) retrieved
      ↓
Fragments → Claude's system prompt
      ↓
Claude answers based on the real text

Implementation

import chromadb
import anthropic

client_ai = anthropic.Anthropic()
chroma = chromadb.PersistentClient(path="./chroma_db")
collection = chroma.get_or_create_collection("docs")

SYSTEM_BASE = "You are an assistant. Answer only based on the provided documents."

def search(query: str, n: int = 3) -> list[dict]:
    results = collection.query(
        query_texts=[query],
        n_results=n,
        include=["documents", "metadatas", "distances"]
    )
    return [
        {"text": doc, "source": meta["source"], "distance": dist}
        for doc, meta, dist in zip(
            results["documents"][0],
            results["metadatas"][0],
            results["distances"][0]
        )
        if dist < 0.5  # relevance filter
    ]

def build_context(chunks: list[dict]) -> str:
    if not chunks:
        return ""
    parts = []
    for i, chunk in enumerate(chunks, 1):
        relevance = 1 - chunk["distance"]
        parts.append(
            f"[Fragment {i} | Source: {chunk['source']} | Relevance: {relevance:.0%}]\n"
            f"{chunk['text']}"
        )
    return "\n\n".join(parts)

def rag_chat(question: str, history: list[dict]) -> str:
    chunks = search(question)
    context = build_context(chunks)

    system = SYSTEM_BASE
    if context:
        system += f"\n\nDocuments:\n{context}"
    else:
        system += "\n\nNo documents found. Let the user know."

    history.append({"role": "user", "content": question})
    response = client_ai.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        system=system,
        messages=history,
    )
    answer = response.content[0].text
    history.append({"role": "assistant", "content": answer})
    return answer

Key RAG decisions

Decision Impact
chunk_size Search precision
n_results Amount of context
distance threshold Filtering out irrelevant results
System prompt Behavior when no data is found

Your reaction to the article

💬 Comments (0)

🔐 Sign in to leave a comment
🚪 Login
💭

No comments yet

Be the first to share your opinion about this article!

🔗 Similar

Similar articles

Continue learning with these materials

📝

httpx: A Modern HTTP Client for Python

httpx is a next-generation HTTP client. Its interface is similar to requests, but it supports...

📅 30.06.2026 👁️ 102
📝

AI Agents: ReAct Loop and Autonomous Actions

A chatbot answers questions. An agent takes action: it calls tools, retrieves real data, and...

📅 30.06.2026 👁️ 97
📝

Typer: CLI Applications Without the Boilerplate

Typer builds CLIs from Python type annotations. No argparse, no manual parsing — just decorators...

📅 30.06.2026 👁️ 82

Did you like the article?

Subscribe to our updates and receive new articles first. Grow with PyLand!