Cost Optimization

Reduce AI costs by 90% while maintaining quality.

Quick Wins

1. Use Haiku for Validation

Simple change, massive savings:

yaml

claude_config:
  builder_model: claude-sonnet-4-5-20250929    # Quality ($3/$15)
  validator_model: claude-3-haiku-20240307     # 90% cheaper ($0.25/$1.25)
  test_runner_model: claude-3-haiku-20240307   # 90% cheaper

Why it works: Validators don't need Sonnet's power. They check builds, run tests, verify claims. Haiku handles this fine.

2. Enable Budget Limits

Prevent runaway costs:

yaml

budget:
  enabled: true
  max_cost_per_ticket: 10.00    # Hard stop at $10
  alert_threshold: 5.00          # Warning at $5

Result: No surprises. System stops before burning money.

3. Prompt Caching Automatically Saves 90%

You don't need to do anything. Anthropic caches repeated context automatically.

Example: Ticket with 10K token context

First call: $0.03 (10K tokens × $3/1M)
Cached calls: $0.003 (10K tokens × $0.30/1M)
Savings: 90% on all subsequent calls

Model Pricing

Sonnet 4.5 (claude-sonnet-4-5-20250929)

Input: $3/1M tokens
Output: $15/1M tokens
Cached: $0.30/1M tokens (90% off)

Use for: Builder agents (code generation needs quality)

Haiku (claude-3-haiku-20240307)

Input: $0.25/1M tokens
Output: $1.25/1M tokens
Cached: $0.03/1M tokens (88% off)

Use for: Validators, test runners (quality sufficient, 90% cheaper)

Full pricing →

Typical Ticket Costs

With Haiku for validation:

Simple ticket: $0.50 - $2.00
Complex ticket: $2.00 - $5.00
Multi-retry edge case: $5.00 - $10.00

Without optimization (Sonnet everywhere):

Simple ticket: $5.00 - $20.00
10x more expensive

Configuration Examples

Cost-Optimized (Recommended)

yaml

workflow:
  claude_config:
    builder_model: claude-sonnet-4-5-20250929
    validator_model: claude-3-haiku-20240307
    test_runner_model: claude-3-haiku-20240307
    budget:
      enabled: true
      max_cost_per_ticket: 10.00
      alert_threshold: 5.00

Quality-Focused (Development)

yaml

workflow:
  claude_config:
    builder_model: claude-sonnet-4-5-20250929
    validator_model: claude-sonnet-4-5-20250929
    test_runner_model: claude-sonnet-4-5-20250929
    budget:
      enabled: true
      max_cost_per_ticket: 50.00   # Higher limit

Maximum Cost Savings (Testing)

yaml

workflow:
  claude_config:
    builder_model: claude-3-haiku-20240307
    validator_model: claude-3-haiku-20240307
    test_runner_model: claude-3-haiku-20240307
    budget:
      enabled: true
      max_cost_per_ticket: 5.00

Monitoring Costs

Dashboard

View real-time costs at: http://localhost:3000

Shows:

Total cost per project
Cost per ticket
Budget alerts
Prompt caching savings

API

bash

# Get metrics for project
curl http://localhost:3001/api/metrics?projectId=your-project

# Response includes:
{
  "costPerTicket": {
    "mean": 2.50,
    "median": 1.80,
    "total": 125.00
  },
  "budgetAlerts": 2,
  "cachingSavings": "$45.00"
}

Budget Events

When tickets exceed thresholds:

json

{
  "event": "budget_alert_threshold_reached",
  "ticket_id": "PROJ-123",
  "current_cost": 5.50,
  "threshold": 5.00
}

{
  "event": "budget_limit_exceeded",
  "ticket_id": "PROJ-124",
  "current_cost": 10.50,
  "limit": 10.00,
  "action": "stopped"
}

What Gets Tracked

Per ticket:

Builder token usage (input/output/cached)
Validator token usage
Test runner token usage
Total cost in USD
Prompt caching savings

Aggregated:

Total project costs
Average cost per ticket
Budget alert count
Caching effectiveness

Best Practices

1. Start Conservative

Begin with $10 limit. Increase if needed:

yaml

max_cost_per_ticket: 10.00

Most tickets cost $1-3. $10 handles edge cases.

2. Use Haiku for Validation

Validators don't need Sonnet's power:

Checking build: Haiku fine
Running tests: Haiku fine
Verifying files exist: Haiku fine

Save 90%, same results.

3. Monitor Caching Effectiveness

Check dashboard for caching savings. Should see 80-90% savings on input tokens.

If not:

Check context size
Verify models support caching
Ensure repeated context across calls

4. Alert on Outliers

Set alert at 50% of max:

yaml

max_cost_per_ticket: 10.00
alert_threshold: 5.00    # 50%

Catches unusual tickets before hitting limit.

Troubleshooting

"Budget limit exceeded" Errors

Cause: Ticket hit max_cost_per_ticket

Solutions:

Review ticket complexity - is it unusually large?
Check retry count - validator failures multiply costs
Increase limit if appropriate
Investigate why ticket needed so many retries

High Costs Despite Haiku

Possible causes:

Large context windows (10K+ tokens)
Multiple retry loops
Complex validation requiring many calls

Solutions:

Review validator checklist - too strict?
Check self-healing retry limit (default: 2)
Reduce context size if possible

Caching Not Working

Check:

Using supported models? (Sonnet/Haiku both support caching)
Context stable across calls? (Changing context defeats caching)
Anthropic account has caching enabled?

Verify: Dashboard should show "Cached Input Tokens" > 0