Fine-Tuning

RAG gives the model access to new information. Fine-tuning changes how the model thinks. Most SMBs need the first, not the second — and confusing them is an expensive mistake.

Fine-tuning modifies a model’s weights by training it on a curated dataset of examples. The model learns new behaviors, styles, response formats, or domain-specific patterns that weren’t in its original training. Unlike RAG, which retrieves knowledge at inference time, fine-tuning bakes knowledge into the model permanently.

Fine-Tuning vs RAG: The Decision

This is the most common strategic question in SMB AI deployment.

Scenario	Better approach	Why
Model needs to know your product’s features	RAG	Knowledge changes; easier to update
Model needs to respond in your company’s specific tone	Fine-tuning	Style doesn’t change; baking it in is more reliable
Model needs current pricing or policy information	RAG	Information changes; fine-tuning becomes stale
Model needs to output a very specific JSON schema	Prompt engineering first, fine-tuning if prompting fails	System prompts often sufficient
Model keeps misunderstanding your domain terminology	Fine-tuning	Vocabulary changes are behavioral, not factual
Model needs to classify documents into 50 proprietary categories	Fine-tuning	Classification accuracy improves significantly with examples
Model needs to follow a specific regulatory format	Fine-tuning	Format compliance is more reliable when trained

The core distinction: RAG solves knowledge problems. Fine-tuning solves behavior problems.

If your AI keeps giving wrong answers because it doesn’t know your product’s features, that’s a RAG problem. If your AI gives correct-but-unusable answers because the response format is wrong, or the tone is wrong, or it can’t classify your proprietary categories — that’s a fine-tuning problem.

LoRA: The Efficient Path

Full fine-tuning (updating all model weights) is expensive and compute-intensive. For most SMB use cases, LoRA (Low-Rank Adaptation) is the practical approach.

LoRA works by training a small set of adapter weights rather than modifying the full model. The base model stays frozen. The LoRA adapter (typically 10–100MB) layers on top and modifies outputs. The result: training costs drop by 60–90% and the adapter can be swapped or removed without touching the base model.

A LoRA adapter for a 7B model requires roughly:

1 A100 40GB or 2 A100 24GB for training
1–5 hours of training depending on dataset size
~2,000–10,000 example pairs for meaningful behavior change

Training the adapter yourself costs $50–$200 in cloud GPU time for a well-scoped task. Managed fine-tuning services (OpenAI, Together AI) run $10–$50 per million tokens of training data, depending on provider and model size.

When the Numbers Don’t Work

Most SMB fine-tuning projects are premature. The common scenario:

Team spends 3 weeks collecting and labeling 500 training examples
Fine-tunes on OpenAI or trains a LoRA adapter
The fine-tuned model is better on the labeled task
They discover that 3 weeks of prompt engineering would have gotten them 80% of the quality improvement at zero training cost

Prompt engineering should always precede fine-tuning. Few-shot examples in the system prompt, output constraints, chain-of-thought instructions — exhaust these first. Fine-tuning is for the gap that remains after serious prompt optimization.

The break-even point: if fine-tuning saves you 15 tokens per inference and you run 10 million inferences per month, you save $150/month on a model priced at $1/million tokens. A fine-tuning project that costs $500 in data labeling + $200 in training pays back in ~5 months at those numbers. Lower volume or smaller per-inference savings and the economics don’t work.

The Maintenance Trap

Fine-tuned models create a new maintenance burden. When a better base model releases (roughly every 3–4 months for popular open-source models), your options are:

Stay on the old base model (cheaper but you miss quality improvements)
Re-train the LoRA adapter on the new base model (1–5 hours + compute cost, but behavior may shift)
Fully re-evaluate the fine-tuned model against your task (required either way)

Contrast with RAG: when a better model releases, you update the inference endpoint and keep your knowledge base. No retraining.

Fine-tuning locks you into a specific model version in a way RAG doesn’t. For fast-moving domains, this is a real cost.

When Fine-Tuning Genuinely Wins

Domain vocabulary. Models trained on medical or legal text at a small scale often perform better on specialty terminology than general models + RAG.
Proprietary classification schemes. Classifying documents into categories that don’t exist anywhere on the internet — your internal ticket taxonomy, your product’s defect codes, your organization’s regulatory categories.
Consistent response formatting. When the output must always follow a very specific structure (insurance claim forms, regulatory filings, API responses for legacy systems) and prompt engineering has failed to maintain consistency.
Privacy. When the use case can’t send data to external RAG infrastructure — fine-tuning bakes knowledge in so inference is fully local.

RAG — The alternative for knowledge problems; usually try this first
Model Quantization — Fine-tuned LoRA adapters also require quantization for deployment
TCO — Training cost + maintenance cost belongs in total cost of ownership
Ollama — Can serve models with LoRA adapters locally
Cost Overrun — Fine-tuning projects frequently overrun if scoped without evaluation gates
Data & Knowledge — Where fine-tuning fits in the knowledge infrastructure layer

WyrdWerk Deployment Wiki

Explorer

Fine-Tuning

Fine-Tuning vs RAG: The Decision

LoRA: The Efficient Path

When the Numbers Don’t Work

The Maintenance Trap

When Fine-Tuning Genuinely Wins

Graph View

Table of Contents

Backlinks

WyrdWerk Deployment Wiki

Explorer

Fine-Tuning

Fine-Tuning vs RAG: The Decision

LoRA: The Efficient Path

When the Numbers Don’t Work

The Maintenance Trap

When Fine-Tuning Genuinely Wins

Related

Graph View

Table of Contents

Backlinks