Appearance
Cost Optimization
Reduce AI costs by 90% while maintaining quality.
Quick Wins
1. Use Haiku for Validation
Simple change, massive savings:
yaml
claude_config:
builder_model: claude-sonnet-4-5-20250929 # Quality ($3/$15)
validator_model: claude-3-haiku-20240307 # 90% cheaper ($0.25/$1.25)
test_runner_model: claude-3-haiku-20240307 # 90% cheaperWhy it works: Validators don't need Sonnet's power. They check builds, run tests, verify claims. Haiku handles this fine.
2. Enable Budget Limits
Prevent runaway costs:
yaml
budget:
enabled: true
max_cost_per_ticket: 10.00 # Hard stop at $10
alert_threshold: 5.00 # Warning at $5Result: No surprises. System stops before burning money.
3. Prompt Caching Automatically Saves 90%
You don't need to do anything. Anthropic caches repeated context automatically.
Example: Ticket with 10K token context
- First call: $0.03 (10K tokens × $3/1M)
- Cached calls: $0.003 (10K tokens × $0.30/1M)
- Savings: 90% on all subsequent calls
Model Pricing
Sonnet 4.5 (claude-sonnet-4-5-20250929)
- Input: $3/1M tokens
- Output: $15/1M tokens
- Cached: $0.30/1M tokens (90% off)
Use for: Builder agents (code generation needs quality)
Haiku (claude-3-haiku-20240307)
- Input: $0.25/1M tokens
- Output: $1.25/1M tokens
- Cached: $0.03/1M tokens (88% off)
Use for: Validators, test runners (quality sufficient, 90% cheaper)
Typical Ticket Costs
With Haiku for validation:
- Simple ticket: $0.50 - $2.00
- Complex ticket: $2.00 - $5.00
- Multi-retry edge case: $5.00 - $10.00
Without optimization (Sonnet everywhere):
- Simple ticket: $5.00 - $20.00
- 10x more expensive
Configuration Examples
Cost-Optimized (Recommended)
yaml
workflow:
claude_config:
builder_model: claude-sonnet-4-5-20250929
validator_model: claude-3-haiku-20240307
test_runner_model: claude-3-haiku-20240307
budget:
enabled: true
max_cost_per_ticket: 10.00
alert_threshold: 5.00Quality-Focused (Development)
yaml
workflow:
claude_config:
builder_model: claude-sonnet-4-5-20250929
validator_model: claude-sonnet-4-5-20250929
test_runner_model: claude-sonnet-4-5-20250929
budget:
enabled: true
max_cost_per_ticket: 50.00 # Higher limitMaximum Cost Savings (Testing)
yaml
workflow:
claude_config:
builder_model: claude-3-haiku-20240307
validator_model: claude-3-haiku-20240307
test_runner_model: claude-3-haiku-20240307
budget:
enabled: true
max_cost_per_ticket: 5.00Monitoring Costs
Dashboard
View real-time costs at: http://localhost:3000
Shows:
- Total cost per project
- Cost per ticket
- Budget alerts
- Prompt caching savings
API
bash
# Get metrics for project
curl http://localhost:3001/api/metrics?projectId=your-project
# Response includes:
{
"costPerTicket": {
"mean": 2.50,
"median": 1.80,
"total": 125.00
},
"budgetAlerts": 2,
"cachingSavings": "$45.00"
}Budget Events
When tickets exceed thresholds:
json
{
"event": "budget_alert_threshold_reached",
"ticket_id": "PROJ-123",
"current_cost": 5.50,
"threshold": 5.00
}
{
"event": "budget_limit_exceeded",
"ticket_id": "PROJ-124",
"current_cost": 10.50,
"limit": 10.00,
"action": "stopped"
}What Gets Tracked
Per ticket:
- Builder token usage (input/output/cached)
- Validator token usage
- Test runner token usage
- Total cost in USD
- Prompt caching savings
Aggregated:
- Total project costs
- Average cost per ticket
- Budget alert count
- Caching effectiveness
Best Practices
1. Start Conservative
Begin with $10 limit. Increase if needed:
yaml
max_cost_per_ticket: 10.00Most tickets cost $1-3. $10 handles edge cases.
2. Use Haiku for Validation
Validators don't need Sonnet's power:
- Checking build: Haiku fine
- Running tests: Haiku fine
- Verifying files exist: Haiku fine
Save 90%, same results.
3. Monitor Caching Effectiveness
Check dashboard for caching savings. Should see 80-90% savings on input tokens.
If not:
- Check context size
- Verify models support caching
- Ensure repeated context across calls
4. Alert on Outliers
Set alert at 50% of max:
yaml
max_cost_per_ticket: 10.00
alert_threshold: 5.00 # 50%Catches unusual tickets before hitting limit.
Troubleshooting
"Budget limit exceeded" Errors
Cause: Ticket hit max_cost_per_ticket
Solutions:
- Review ticket complexity - is it unusually large?
- Check retry count - validator failures multiply costs
- Increase limit if appropriate
- Investigate why ticket needed so many retries
High Costs Despite Haiku
Possible causes:
- Large context windows (10K+ tokens)
- Multiple retry loops
- Complex validation requiring many calls
Solutions:
- Review validator checklist - too strict?
- Check self-healing retry limit (default: 2)
- Reduce context size if possible
Caching Not Working
Check:
- Using supported models? (Sonnet/Haiku both support caching)
- Context stable across calls? (Changing context defeats caching)
- Anthropic account has caching enabled?
Verify: Dashboard should show "Cached Input Tokens" > 0
Cost Optimization Checklist
- [ ] Haiku for validators and test runners
- [ ] Budget limit enabled ($10 recommended)
- [ ] Alert threshold at 50% of limit
- [ ] Dashboard monitoring configured
- [ ] Team trained on budget alerts
Result: 90% cost reduction with same quality.
Related:
