Infrastructure Layer

The GPU rental ad shows $1.49/hour. The invoice shows DevOps salary, model updates, and the 2 AM outage nobody planned for.

Infrastructure decisions define your cost structure and compliance posture for years. The wrong call is expensive to reverse. For SMB solo implementers, the default is almost always managed APIs — until the math or the regulator says otherwise.

Cloud APIs vs Self-Hosting

Managed APIs (OpenAI, Anthropic, Google) win on cost for roughly 87% of use cases. Self-hosting only becomes viable at very high volume — around 11 billion tokens/month — or when compliance mandates on-premise inference.

Volume	Winner
Under $10K/month API spend	Managed APIs
50M tokens/day on GPT-4o-mini	API: ~$2,250/mo vs self-hosted: ~$5,175/mo
500M+ tokens/day, steady	Self-hosting can win 5× on cost

Hidden self-hosting costs: DevOps engineering, model refresh cycles every 6–8 weeks, networking, load balancing, and downtime that falls on you, not the provider.

The Decision Framework

Concurrent Users	Start With
Under 5	Ollama (local) or managed API
5–50	vLLM on single GPU or managed API
50–500	vLLM with tensor parallelism
500+	Multiple vLLM instances behind load balancer

Most SMBs sit comfortably in the top two rows. If you’re in the bottom half, you already know it.

Key Content

Self-Hosted AI — When to run your own
Ollama — Local development and small teams
vLLM — Production-scale inference
Deployment Patterns — Architecture by team size
TCO — Real cost numbers

WyrdWerk Deployment Wiki

Explorer

Infrastructure Layer

Cloud APIs vs Self-Hosting

The Decision Framework

Key Content

Graph View

Table of Contents

Backlinks