1. LLM API Costs by Provider
Current pricing (as of early 2026) for major LLM providers:
OpenAI
GPT-4 Turbo: $10/1M input, $30/1M output tokens
GPT-4o: $5/1M input, $15/1M output tokens
GPT-3.5 Turbo: $0.50/1M input, $1.50/1M output tokens
Embeddings (ada-002): $0.10/1M tokens
Anthropic
Claude 3 Opus: $15/1M input, $75/1M output tokens
Claude 3 Sonnet: $3/1M input, $15/1M output tokens
Claude 3 Haiku: $0.25/1M input, $1.25/1M output tokens
Vector Databases
Pinecone Starter: Free (up to 100K vectors)
Pinecone Standard: $70+/month
pgvector: Cost of PostgreSQL instance
Weaviate Cloud: $25+/month
2. Expected Costs by Stage
MVP / Beta (0-100 users)
$100-500/month
- • Light usage, few API calls
- • Use best models (GPT-4) for signal
- • Free tier for vector storage
- • Don't optimize yet
Early Growth (100-1K users)
$500-2,000/month
- • Start caching common queries
- • Consider smaller models for simple tasks
- • Paid vector storage tier
- • Monitor per-user costs
Scale (1K-10K users)
$2,000-15,000/month
- • Multi-model routing essential
- • Aggressive caching (40-60% hit rate)
- • Consider fine-tuned smaller models
- • Rate limiting per user tier
Enterprise (10K+ users)
$15,000-100,000+/month
- • Enterprise agreements for volume discounts
- • Self-hosted models for some workloads
- • Dedicated infrastructure
- • Cost per user becomes key metric
4. Cost Optimization Strategies
Caching (40-70% savings)
Semantic caching returns cached results for similar queries. Implementation cost: 1-2 days. ROI: Usually within first month.
Model Routing (30-50% savings)
Use cheaper models for simple tasks. GPT-3.5 is 20x cheaper than GPT-4 and sufficient for classification, extraction, and formatting.
Prompt Optimization (10-30% savings)
Shorter prompts = fewer tokens = lower cost. Remove redundant instructions. Use concise system prompts. Compress context.
Rate Limiting (prevents runaway costs)
Set per-user and global limits. Implement usage tiers. Alert on anomalies before they become expensive.
5. Budgeting for AI Features
How to budget AI costs as a percentage of other infrastructure:
- Pre-product-market-fit: Don't optimize. Spend on the best models to learn whether AI adds value.
- Post-PMF: AI costs should be 10-30% of total infrastructure. Optimize aggressively if higher.
- Pricing consideration: If AI is core to value, build cost into pricing. Many AI-first products charge $20-50/user/month.
- Per-user math: Calculate cost per monthly active user. Should be <10% of revenue per user.