📝 LLM & AI

ChromaDB: Vector Database for RAG

P
Author
Pyland
📅
Published
30.06.2026
⏱️
Reading time
1 min
👁️
Views
78
🌳
Level
Advanced

ChromaDB is an embeddable vector database. It stores texts and their embeddings, and can search by semantic meaning. No separate server required.

Installation

uv add chromadb
# First install downloads ~500 MB (the embedding model)

PersistentClient — Saving Data to Disk

import chromadb

# Data is saved to the ./chroma_db folder
client = chromadb.PersistentClient(path="./chroma_db")

# Create a collection or retrieve an existing one
collection = client.get_or_create_collection(name="documents")

chromadb.Client() — stores data in RAM only; it is lost on restart.

add() — Adding Documents

collection.add(
    documents=["Python — интерпретируемый язык.", "Django — веб-фреймворк."],
    ids=["doc_0", "doc_1"],
    metadatas=[{"source": "intro.txt"}, {"source": "frameworks.txt"}]
)

Three required fields:
- documents — the texts
- ids — unique strings (duplicates are not allowed)
- metadatas — dictionaries with metadata (optional but useful)

ChromaDB automatically generates embeddings using its built-in model.

results = collection.query(
    query_texts=["Как создать веб-приложение?"],
    n_results=3,
    include=["documents", "metadatas", "distances"]
)

for doc, meta, dist in zip(
    results["documents"][0],
    results["metadatas"][0],
    results["distances"][0]
):
    relevance = 1 - dist
    print(f"[{relevance:.0%}] {meta['source']}: {doc[:80]}")

Interpreting Distance

distance is the cosine distance, ranging from 0 to 2:

Distance Relevance Interpretation
0.0–0.3 70–100% Very similar
0.3–0.6 40–70% Moderately similar
> 0.6 < 40% Weak match
# Filter by threshold
THRESHOLD = 0.5
relevant = [
    (doc, meta) for doc, meta, dist
    in zip(results["documents"][0], results["metadatas"][0], results["distances"][0])
    if dist < THRESHOLD
]

Checking for Duplicates Before Adding

existing_ids = set(collection.get(include=[])["ids"])

new_docs, new_ids, new_metas = [], [], []
for doc, id_, meta in zip(documents, ids, metadatas):
    if id_ not in existing_ids:
        new_docs.append(doc)
        new_ids.append(id_)
        new_metas.append(meta)

if new_docs:
    collection.add(documents=new_docs, ids=new_ids, metadatas=new_metas)

Your reaction to the article

💬 Comments (0)

🔐 Sign in to leave a comment
🚪 Login
💭

No comments yet

Be the first to share your opinion about this article!

🔗 Similar

Similar articles

Continue learning with these materials

📝

httpx: A Modern HTTP Client for Python

httpx is a next-generation HTTP client. Its interface is similar to requests, but it supports...

📅 30.06.2026 👁️ 108
📝

AI Agents: ReAct Loop and Autonomous Actions

A chatbot answers questions. An agent takes action: it calls tools, retrieves real data, and...

📅 30.06.2026 👁️ 101
📝

Typer: CLI Applications Without the Boilerplate

Typer builds CLIs from Python type annotations. No argparse, no manual parsing — just decorators...

📅 30.06.2026 👁️ 88

Did you like the article?

Subscribe to our updates and receive new articles first. Grow with PyLand!