Embeddings

Your search says “contract termination clauses.” Your contract database has documents about “agreement dissolution provisions.” Without embeddings, those don’t match. With the right embedding model, they do.

Embeddings convert text into numerical vectors — lists of hundreds or thousands of decimal numbers that represent a piece of text’s meaning in high-dimensional space. Two semantically similar texts produce vectors that are close together. Two semantically unrelated texts produce vectors that are far apart.

This is what makes RAG systems work. A traditional keyword search requires the query to share words with the document. Semantic search, powered by embeddings, finds conceptually related content even when the words don’t match.

How the Process Works

At index time: Each document chunk is passed through an embedding model, which produces a vector. The vector is stored in a vector database alongside the original text.
At query time: The user’s query is passed through the same embedding model, producing a query vector.
Matching: The vector database finds document vectors closest to the query vector. Those documents are retrieved and passed to the LLM as context.

The embedding model used at index time and at query time must be identical. Switch models mid-deployment and every stored vector becomes incompatible — requiring a full re-indexing of your knowledge base.

Model Options

Model	Hosting	Cost	Dimensions	Notes
`text-embedding-3-small` (OpenAI)	API	$0.02/1M tokens	1536	Good default for most use cases
`text-embedding-3-large` (OpenAI)	API	$0.13/1M tokens	3072	Higher accuracy, 6× more expensive
`nomic-embed-text`	Local (Ollama)	Free	768	Strong performer for local deployments
`all-MiniLM-L6-v2` (Sentence-Transformers)	Local	Free	384	Smaller, fast, lower accuracy
`mxbai-embed-large`	Local (Ollama)	Free	1024	Competitive with OpenAI small on MTEB

The MTEB (Massive Text Embedding Benchmark) leaderboard at Hugging Face is the authoritative ranking of embedding model performance across retrieval, classification, and clustering tasks.

Dimensions and What They Mean

Every embedding model produces vectors of a fixed dimension — typically 384 to 3072 numbers per chunk. Dimensions roughly correspond to how much semantic nuance the model captures.

More dimensions generally means:

Better accuracy on semantically subtle queries
More storage space per document chunk
Slower similarity search at large scale

For most SMB deployments (< 1 million documents), dimension count has negligible impact on search latency. The accuracy difference between models matters more than the dimension count difference.

The Model Choice Is a Long-Term Decision

Switching embedding models requires re-embedding your entire knowledge base. If you embedded 50,000 documents with text-embedding-3-small, switching to nomic-embed-text means:

Re-processing all 50,000 documents through the new model
Replacing all stored vectors in your vector database
Accepting that your retrieval behavior will change (possibly for better or worse)

For small knowledge bases (under 10,000 chunks), this is a few hours of work. For large ones, it’s a weekend project. Choose your embedding model at the start of a project, not after you’ve indexed everything.

Local vs API

API embeddings (OpenAI, Cohere, Google): Lower setup friction, high quality, cost scales with document count and query volume. At $0.02/million tokens for text-embedding-3-small, embedding a 10,000-page knowledge base costs roughly $2–5 one-time. Ongoing query costs depend on traffic volume.

Local embeddings (via Ollama): Zero per-query cost after setup. nomic-embed-text running on a CPU server handles hundreds of thousands of embeddings per day for free. The cost is setup time and the hardware it runs on.

For GDPR or DPDP deployments where documents contain sensitive data, local embeddings keep the indexing pipeline air-gapped. No document content leaves your network during indexing.

The Quality Problem with Mismatch

A common failure pattern: choosing a fast, cheap embedding model at the start and discovering months later that certain queries fail to retrieve relevant documents. The mismatch between how users phrase questions and how documents are written is too large for a low-dimension model to bridge.

Before committing to an embedding model, test it with the actual queries your users will ask against the actual documents you’ll retrieve from. A model that scores well on general benchmarks may perform poorly on your specific domain vocabulary.

RAG — The retrieval pattern that embeddings enable
Vector Databases — Where embeddings are stored and searched
Ollama — Runs embedding models locally (nomic-embed-text, mxbai-embed-large)
Data & Knowledge — The knowledge infrastructure layer
Knowledge Base Decay — When the embedded documents go stale

WyrdWerk Deployment Wiki

Explorer

Embeddings

How the Process Works

Model Options

Dimensions and What They Mean

The Model Choice Is a Long-Term Decision

Local vs API

The Quality Problem with Mismatch

Graph View

Table of Contents

Backlinks

WyrdWerk Deployment Wiki

Explorer

Embeddings

How the Process Works

Model Options

Dimensions and What They Mean

The Model Choice Is a Long-Term Decision

Local vs API

The Quality Problem with Mismatch

Related

Graph View

Table of Contents

Backlinks