Skip to content

Cost Optimization

Reduce AI costs by 90% while maintaining quality.

Quick Wins

1. Use Haiku for Validation

Simple change, massive savings:

yaml
claude_config:
  builder_model: claude-sonnet-4-5-20250929    # Quality ($3/$15)
  validator_model: claude-3-haiku-20240307     # 90% cheaper ($0.25/$1.25)
  test_runner_model: claude-3-haiku-20240307   # 90% cheaper

Why it works: Validators don't need Sonnet's power. They check builds, run tests, verify claims. Haiku handles this fine.

2. Enable Budget Limits

Prevent runaway costs:

yaml
budget:
  enabled: true
  max_cost_per_ticket: 10.00    # Hard stop at $10
  alert_threshold: 5.00          # Warning at $5

Result: No surprises. System stops before burning money.

3. Prompt Caching Automatically Saves 90%

You don't need to do anything. Anthropic caches repeated context automatically.

Example: Ticket with 10K token context

  • First call: $0.03 (10K tokens × $3/1M)
  • Cached calls: $0.003 (10K tokens × $0.30/1M)
  • Savings: 90% on all subsequent calls

Model Pricing

Sonnet 4.5 (claude-sonnet-4-5-20250929)

  • Input: $3/1M tokens
  • Output: $15/1M tokens
  • Cached: $0.30/1M tokens (90% off)

Use for: Builder agents (code generation needs quality)

Haiku (claude-3-haiku-20240307)

  • Input: $0.25/1M tokens
  • Output: $1.25/1M tokens
  • Cached: $0.03/1M tokens (88% off)

Use for: Validators, test runners (quality sufficient, 90% cheaper)

Full pricing →

Typical Ticket Costs

With Haiku for validation:

  • Simple ticket: $0.50 - $2.00
  • Complex ticket: $2.00 - $5.00
  • Multi-retry edge case: $5.00 - $10.00

Without optimization (Sonnet everywhere):

  • Simple ticket: $5.00 - $20.00
  • 10x more expensive

Configuration Examples

yaml
workflow:
  claude_config:
    builder_model: claude-sonnet-4-5-20250929
    validator_model: claude-3-haiku-20240307
    test_runner_model: claude-3-haiku-20240307
    budget:
      enabled: true
      max_cost_per_ticket: 10.00
      alert_threshold: 5.00

Quality-Focused (Development)

yaml
workflow:
  claude_config:
    builder_model: claude-sonnet-4-5-20250929
    validator_model: claude-sonnet-4-5-20250929
    test_runner_model: claude-sonnet-4-5-20250929
    budget:
      enabled: true
      max_cost_per_ticket: 50.00   # Higher limit

Maximum Cost Savings (Testing)

yaml
workflow:
  claude_config:
    builder_model: claude-3-haiku-20240307
    validator_model: claude-3-haiku-20240307
    test_runner_model: claude-3-haiku-20240307
    budget:
      enabled: true
      max_cost_per_ticket: 5.00

Monitoring Costs

Dashboard

View real-time costs at: http://localhost:3000

Shows:

  • Total cost per project
  • Cost per ticket
  • Budget alerts
  • Prompt caching savings

API

bash
# Get metrics for project
curl http://localhost:3001/api/metrics?projectId=your-project

# Response includes:
{
  "costPerTicket": {
    "mean": 2.50,
    "median": 1.80,
    "total": 125.00
  },
  "budgetAlerts": 2,
  "cachingSavings": "$45.00"
}

Budget Events

When tickets exceed thresholds:

json
{
  "event": "budget_alert_threshold_reached",
  "ticket_id": "PROJ-123",
  "current_cost": 5.50,
  "threshold": 5.00
}

{
  "event": "budget_limit_exceeded",
  "ticket_id": "PROJ-124",
  "current_cost": 10.50,
  "limit": 10.00,
  "action": "stopped"
}

What Gets Tracked

Per ticket:

  • Builder token usage (input/output/cached)
  • Validator token usage
  • Test runner token usage
  • Total cost in USD
  • Prompt caching savings

Aggregated:

  • Total project costs
  • Average cost per ticket
  • Budget alert count
  • Caching effectiveness

Best Practices

1. Start Conservative

Begin with $10 limit. Increase if needed:

yaml
max_cost_per_ticket: 10.00

Most tickets cost $1-3. $10 handles edge cases.

2. Use Haiku for Validation

Validators don't need Sonnet's power:

  • Checking build: Haiku fine
  • Running tests: Haiku fine
  • Verifying files exist: Haiku fine

Save 90%, same results.

3. Monitor Caching Effectiveness

Check dashboard for caching savings. Should see 80-90% savings on input tokens.

If not:

  • Check context size
  • Verify models support caching
  • Ensure repeated context across calls

4. Alert on Outliers

Set alert at 50% of max:

yaml
max_cost_per_ticket: 10.00
alert_threshold: 5.00    # 50%

Catches unusual tickets before hitting limit.

Troubleshooting

"Budget limit exceeded" Errors

Cause: Ticket hit max_cost_per_ticket

Solutions:

  1. Review ticket complexity - is it unusually large?
  2. Check retry count - validator failures multiply costs
  3. Increase limit if appropriate
  4. Investigate why ticket needed so many retries

High Costs Despite Haiku

Possible causes:

  • Large context windows (10K+ tokens)
  • Multiple retry loops
  • Complex validation requiring many calls

Solutions:

  • Review validator checklist - too strict?
  • Check self-healing retry limit (default: 2)
  • Reduce context size if possible

Caching Not Working

Check:

  1. Using supported models? (Sonnet/Haiku both support caching)
  2. Context stable across calls? (Changing context defeats caching)
  3. Anthropic account has caching enabled?

Verify: Dashboard should show "Cached Input Tokens" > 0

Cost Optimization Checklist

  • [ ] Haiku for validators and test runners
  • [ ] Budget limit enabled ($10 recommended)
  • [ ] Alert threshold at 50% of limit
  • [ ] Dashboard monitoring configured
  • [ ] Team trained on budget alerts

Result: 90% cost reduction with same quality.

Related:

Part of the Zeron Platform | Built with VitePress