📝 LLM & AI

Embeddings: Coordinates of Text in Semantic Space

P
Author
Pyland
📅
Published
30.06.2026
⏱️
Reading time
1 min
👁️
Views
84
🌳
Level
Advanced

An embedding is a numerical vector that represents a piece of text. Texts that are similar in meaning get vectors that are close to each other. This is the foundation of semantic search in RAG systems.

Analogy

Imagine a space with coordinates. Each word is a point:
- “cat” and “kitten” — nearly in the same spot
- “cat” and “automobile” — far apart
- “Python” and “programming” — close together

An embedding is exactly those coordinates in such a space, just with hundreds of dimensions.

How ChromaDB creates embeddings

ChromaDB automatically creates embeddings during add() using the built-in all-MiniLM-L6-v2 model:

import chromadb

collection = chromadb.PersistentClient("./db").get_or_create_collection("docs")

# ChromaDB computes embeddings for each text automatically
collection.add(
    documents=["Python is a programming language", "The cat is sitting on the sofa"],
    ids=["doc_0", "doc_1"]
)

# During search, the query is also turned into an embedding
results = collection.query(
    query_texts=["How do I start programming?"],
    n_results=1
)
# Returns "Python is a programming language" as the closest match

Cosine distance

ChromaDB uses cosine distance (0 to 2):

distance = 0.0  → texts are identical
distance = 0.3  → very similar (relevance ~70%)
distance = 0.7  → weak connection
distance = 2.0  → opposite in meaning
results = collection.query(query_texts=["question"], n_results=5, include=["distances"])
for dist in results["distances"][0]:
    relevance = 1 - dist
    print(f"Relevance: {relevance:.1%}")
Question: "How do I write a function?"
Text:     "def lets you declare a subroutine in Python"

With keyword search — no matches. With semantic search — high relevance, because the meanings are close.

Limitations of ChromaDB’s built-in model

  • The all-MiniLM-L6-v2 model handles English better than other languages
  • For better quality with non-English text: OpenAI text-embedding-3-small or text-embedding-ada-002
# Example with OpenAI embeddings (for reference)
import chromadb
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction

ef = OpenAIEmbeddingFunction(api_key="sk-...", model_name="text-embedding-3-small")
collection = client.get_or_create_collection("docs", embedding_function=ef)

Your reaction to the article

💬 Comments (0)

🔐 Sign in to leave a comment
🚪 Login
💭

No comments yet

Be the first to share your opinion about this article!

🔗 Similar

Similar articles

Continue learning with these materials

📝

What is an ORM

ORM (Object-Relational Mapping) is a technology that lets you work with a database through Python...

📅 30.06.2026 👁️ 131
📝

httpx: A Modern HTTP Client for Python

httpx is a next-generation HTTP client. Its interface is similar to requests, but it supports...

📅 30.06.2026 👁️ 108
📝

AI Agents: ReAct Loop and Autonomous Actions

A chatbot answers questions. An agent takes action: it calls tools, retrieves real data, and...

📅 30.06.2026 👁️ 100

Did you like the article?

Subscribe to our updates and receive new articles first. Grow with PyLand!