Appearance
Development Patterns
Key patterns and principles from building Nexus (Conductor + AI Learning Service).
Core Philosophy
1. Deterministic Orchestration
The Conductor is deterministic TypeScript - no AI in the orchestration logic.
Why: Conductor cannot be "reasoned with". Enforcement is mechanical and predictable.
typescript
// ✅ GOOD - Deterministic
if (!checklist.allItemsComplete()) {
throw new Error('Checklist incomplete');
}
// ❌ BAD - AI-influenced
const response = await ai.ask('Can we skip this?');2. Evidence-Based Validation
Never trust claims. Always check actual artifacts.
Lesson from path confusion incident: Agents claimed to work on production code but worked in deleted directory.
typescript
// ✅ GOOD - Verify
const pathExists = await fs.pathExists(workingDir);
const files = await fs.readdir(workingDir);
const hasPackageJson = files.includes('package.json');
// ❌ BAD - Trust
const config = { workingDir: userInput }; // Assume valid3. Fresh Validation Context
Validators should have ZERO knowledge of builder work.
Adversarial approach prevents confirmation bias.
typescript
// Builder saves manifest to Redis
await redis.set(`ticket:${id}:manifest`, JSON.stringify(manifest));
// Validator gets ONLY the manifest (no shared memory)
const validator = await spawnValidator({ manifestPath: '/tmp/manifest.json' });Multi-Layered Prevention
When fixing bugs, build 4 layers of protection:
- Configuration Time - Validate at input (catch early)
- Dashboard Time - Visual alerts (impossible to miss)
- Runtime Time - Explicit checks (warn during work)
- Documentation Time - Process guides (prevent repeat)
Example: Path confusion prevention
- Layer 1: Real-time path validation API in config UI
- Layer 2: Red dashboard alert for invalid paths
- Layer 3: Agent context warnings about common mistakes
- Layer 4: Incident report + prevention guide
See Path Confusion Prevention for full implementation.
Incident Response
When things break:
- Investigate - Don't make hasty decisions
- Implement Multi-Layer - Prevention at all 4 layers
- Document - Incident report + prevention guide
- Verify - Test that prevention actually works
Example: October 10, 2025 Path Confusion
State Management
Redis is a Cache
Critical lesson from data loss incident: Budget data stored ONLY in Redis was lost when flushed.
Rules:
- Redis for operational state (fast, volatile)
- External storage for financial/audit data (durable)
- Backup critical Redis data regularly
- Metrics reconstructable from GitHub PRs
Redis Key Patterns
project:{projectId}:active_ticket
project:{projectId}:config
project:{projectId}:completed_tickets
project:{projectId}:budget_total
ticket:{ticketId}:state
ticket:{ticketId}:manifest
ticket:{ticketId}:validation_report
ticket:{ticketId}:metrics
ticket:{ticketId}:budget
conductor:health
conductor:last_poll:{projectId}State Transitions
pending → building → validating → completed
↓
failedAll transitions are deterministic. No AI decision-making in state changes.
Work First, Bureaucracy Second
Most important lesson from production incidents: Never let external system failures block work completion.
typescript
// ✅ GOOD - Complete work regardless
try {
await jiraClient.updateTicketStatus(id, 'Done');
} catch (error) {
logger.warn('Jira failed, but work complete - continuing');
// Work completion continues
}
// ❌ BAD - Block on external system
await jiraClient.updateTicketStatus(id, 'Done'); // If this fails, ticket failsApply to: Jira, GitHub, Slack, email - NOT to core validation (builder/validator must pass).
Cost Optimization
Use appropriate models for each task:
- Builder: Claude Sonnet 4.5 ($3/$15 per 1M tokens) - Needs quality
- Validator: Claude Haiku ($0.25/$1.25 per 1M tokens) - 90% cheaper, sufficient
- Test Runner: Claude Haiku - Cost-optimized
Typical costs:
- Simple ticket: $0.50 - $2.00
- Complex ticket: $2.00 - $5.00
- Multi-retry: $5.00 - $10.00
Budget limits prevent runaway costs.
Self-Healing
Systems should recover automatically:
Validator Retries: If Reality Validator finds issues, automatically retry with corrections (up to 2 attempts).
typescript
while (validatorRetryCount <= 2) {
const report = await validator.run();
if (report.recommendation === 'needs_rework') {
const correctionPrompt = generateCorrections(report);
await builder.resume(correctionPrompt);
validatorRetryCount++;
} else {
break; // Success
}
}Jira Fallback: Continue even if Jira fails (work first, bureaucracy second).
See Self-Healing Documentation for details.
Agent Context Enhancement
When agents fail repeatedly on same issue, add warnings to agent context:
Example: Path confusion warnings
typescript
const agentContext = `
CRITICAL: Verify working directory before changes!
Common mistake: Working in deleted monorepo instead of production.
✓ Check: /apps/zeron-feedback-service/ (production)
✗ Avoid: /apps/zeron/feedback-service/ (deleted)
STOP if you're in the wrong directory.
`;Prevents repeat failures by making agents aware of common pitfalls.
Testing Strategy
Evidence-Based Tests
Test actual behavior, not mocked responses:
typescript
// ✅ GOOD - Real filesystem
test('validates path', async () => {
const tmpDir = await fs.mkdtemp('/tmp/test-');
await fs.writeFile(`${tmpDir}/package.json`, '{}');
const result = await validatePath(tmpDir);
expect(result.valid).toBe(true);
await fs.remove(tmpDir);
});
// ❌ BAD - Mocked
test('validates path', async () => {
jest.spyOn(fs, 'pathExists').mockResolvedValue(true);
// Not testing real behavior
});Integration Over Unit
Test workflows end-to-end when possible. Unit tests can miss integration issues.
Key Learnings
- Redis is volatile - Back up financial data
- External systems fail - Don't block work on them
- Agents make mistakes - Build 4 layers of protection
- Evidence over claims - Always verify filesystem
- Deterministic wins - No AI in orchestration logic
- Fresh validators - No shared context with builders
- Self-healing works - Retry with corrections before failing
Related:
