Skip to content

Incident Report: Path Confusion - Multiple Failed PRs

Date: October 10, 2025 Severity: High Status: Resolved Report Author: Claude (Conductor AI Assistant)


Executive Summary

Three pull requests failed due to agents targeting the wrong codebase directory. All three PRs attempted to modify code in a deleted monorepo directory (~/apps/zeron/feedback-service/) instead of the correct production codebase (~/apps/zeron-feedback-service/). This resulted in massive PRs containing 100+ files of deleted monorepo code mixed with legitimate conductor improvements.

Impact:

  • 3 failed pull requests requiring manual cleanup
  • ~4-6 hours of lost development time
  • Need to cherry-pick clean conductor commits from contaminated branches
  • Risk of merging wrong code into main branch

Resolution:

  • Closed all contaminated PRs with explanation
  • Created clean PR with cherry-picked conductor commits
  • Implemented comprehensive path confusion prevention measures
  • Updated project configuration to correct paths

Timeline

Initial Incident (Unknown Date - Prior Session)

  • Multiple PRs created with mixed content
  • User reported issues but PRs remained open
  • Session ended before resolution

Discovery Phase (October 10, 2025 - Morning)

09:00 - User requested PR validation: "can you now validate the PRs to make sure they are still relevant?"

09:05 - Initial investigation revealed 3 open PRs:

  • PR containing: "feat(conductor): Add real-time orchestration with SSE dashboard"
  • PR containing: "feat: Add Conductor orchestration system with dashboard"
  • PR containing: "feat(conductor): Add multi-project orchestration system"

09:10 - MISTAKE: Hastily closed PRs without proper investigation, assuming they targeted wrong codebase

09:12 - USER CHALLENGE: User questioned closures:

"why wouldn't we add this to the feedback service?" "these seem to include completed, working changes/"

Investigation Phase (09:15 - 09:45)

09:15 - Reopened PRs for proper investigation

09:20 - Discovered root cause via git diff analysis:

bash
# All PRs showed massive feedback-service deletions
 delete mode 100644 feedback-service/.env.development.example
 delete mode 100644 feedback-service/.eslintrc.js
 delete mode 100644 feedback-service/packages/api/...
 # ... 100+ deleted files

09:25 - Root cause identified:

  1. Monorepo migration: feedback-service moved from ~/apps/zeron/feedback-service/ to separate repo ~/apps/zeron-feedback-service/
  2. Old monorepo code deleted from main branch
  3. Project config still pointed to old path
  4. Agents worked in old directory, detected "missing" files as deletions

09:30 - Verified production codebase:

  • Correct: ~/apps/zeron-feedback-service/ (JavaScript Express, active development)
  • Wrong: ~/apps/zeron/feedback-service/ (deleted TypeScript monorepo)

Resolution Phase (09:45 - 11:00)

09:45 - User approved extraction strategy (Option A):

Extract clean conductor work into new PR, separate from feedback-service changes

09:50 - Created extraction branch:

bash
git checkout -b feat/conductor-orchestration-system origin/main

10:00 - Cherry-picked clean conductor commits:

  • Skipped commits related to feedback-service work
  • Skipped: Self-healing metrics (merge conflicts)
  • Resolved conflicts in web-server.ts

10:30 - Created new clean PR:

  • Title: "feat(conductor): Add orchestration system with multi-project support"
  • Clean conductor work only
  • No contaminated feedback-service changes

10:35 - Closed all original PRs with explanations:

  • All closed due to wrong feedback-service path contamination

10:40 - Updated project configuration:

yaml
# conductor/config/feedback-service.yaml
project:
  working_directory: ~/apps/zeron-feedback-service  # CORRECTED

11:00 - Clean PR merged successfully

Prevention Phase (11:00 - 13:00)

11:00 - User requested implementation of learnings:

"first, lets take these learnings and work them in to the Conductor and our UI"

11:15 - Created comprehensive prevention documentation: PATH_CONFUSION_PREVENTION.md

11:30 - Enhanced agent context warnings in agent-manager.ts:

  • Critical path verification instructions
  • Warning about common confusion scenarios
  • Explicit "STOP if wrong directory" guidance

12:00 - Implemented real-time path validation:

  • Backend API endpoint: /api/config/validate-path
  • Frontend validation button in config UI
  • Codebase type detection (TypeScript, JavaScript, Python, Go, Rust)

12:30 - Added dashboard path issues alert:

  • Prominent red warning banner
  • Lists projects with invalid paths
  • Direct link to fix configuration

13:00 - Incident report completed


Root Cause Analysis

Primary Cause

Configuration Lag After Repo Migration

The feedback-service was migrated from monorepo to standalone repository, but the conductor configuration was not updated immediately:

Before Migration:

yaml
project:
  id: feedback-service
  working_directory: ~/apps/zeron/feedback-service  # Monorepo location

After Migration (SHOULD HAVE BEEN):

yaml
project:
  id: feedback-service
  working_directory: ~/apps/zeron-feedback-service  # New location

Contributing Factors

  1. No Path Validation on Startup

    • Conductor did not verify working directories existed
    • Agents spawned even when paths were invalid
    • No warnings in logs or dashboard
  2. Lack of Agent Context Awareness

    • Agents had no warnings about path confusion risks
    • No pre-flight checks before making changes
    • No codebase verification step
  3. Similar Directory Names

    • feedback-service vs zeron-feedback-service easily confused
    • Both could plausibly exist in same parent directory
    • No obvious visual distinction
  4. Absent Real-Time Validation

    • Configuration UI had no path validation
    • No feedback when entering invalid paths
    • Changes saved without verification
  5. Human Process Gap

    • No migration checklist requiring config updates
    • No documentation about path dependencies
    • No post-migration verification

Impact Assessment

Development Impact

  • Time Lost: ~6 hours total (3 failed PR attempts + investigation + cleanup)
  • Code Churn: 3 contaminated branches requiring cleanup
  • Cherry-Pick Effort: Manual extraction of 14 commits from contaminated history

Risk Impact

  • High Risk: Could have merged wrong code if not caught
  • Medium Risk: Confusion about which codebase is authoritative
  • Low Risk: No production deployments affected (caught before merge)

Process Impact

  • Documentation Debt: Need to document repo migration process
  • Tooling Debt: Need automated path validation
  • Training Debt: Need to educate agents about path risks

Resolution Summary

Immediate Actions Taken

  1. ✅ Closed all contaminated PRs with clear explanations
  2. ✅ Created clean PR with extracted conductor work
  3. ✅ Updated feedback-service configuration to correct path
  4. ✅ Merged clean conductor work to main

Prevention Measures Implemented

  1. Documentation: Created PATH_CONFUSION_PREVENTION.md
  2. Agent Warnings: Enhanced context with critical path verification instructions
  3. Real-Time Validation: Added path validation API and UI
  4. Codebase Detection: Implemented fingerprinting (TypeScript, JavaScript, Python, Go, Rust)
  5. Dashboard Alerts: Added prominent warning banner for path issues
  6. Incident Report: Documented learnings and prevention measures

Lessons Learned

What Went Well

  1. User Challenge: User questioned hasty PR closures, forcing proper investigation
  2. Clean Extraction: Successfully isolated good conductor work from bad feedback-service changes
  3. Comprehensive Prevention: Implemented multi-layered protection against recurrence

What Went Wrong

  1. Hasty Decision: Closed PRs without thorough investigation
  2. Lack of Validation: No path checking before spawning agents
  3. Configuration Lag: Migration completed but configs not updated

Process Improvements Needed

1. Repo Migration Checklist

When moving code between repositories:

  • [ ] Update all project configs with new paths
  • [ ] Verify paths in dashboard show ✅
  • [ ] Update CLAUDE.md with correct paths
  • [ ] Test agent spawn in new location
  • [ ] Document the migration in project docs
  • [ ] Add redirect/warning in old location

2. Pre-Spawn Validation

Before spawning any agent:

  • [ ] Verify working directory exists
  • [ ] Verify directory is not empty
  • [ ] Verify expected codebase type matches config
  • [ ] Log fingerprint for audit trail
  • [ ] Abort spawn if checks fail

3. Configuration Validation

When saving project config:

  • [ ] Validate path exists immediately
  • [ ] Show real-time feedback (✅ / ⚠️ / ❌)
  • [ ] Detect codebase type and show to user
  • [ ] Warn if directory is empty
  • [ ] Require confirmation if path seems suspicious

Prevention Verification

Testing Performed

  • ✅ Config UI path validation tested with valid path
  • ✅ Config UI path validation tested with invalid path
  • ✅ Config UI path validation tested with empty directory
  • ✅ Dashboard alert tested with invalid project path
  • ✅ Agent context verified to include path warnings

Future Testing Needed

  • [ ] End-to-end test of agent spawning with invalid path (should fail gracefully)
  • [ ] Test path validation with various codebase types
  • [ ] Test migration checklist with actual repo move
  • [ ] Verify dashboard alert appears on conductor startup with invalid paths

  • Prevention Guide: conductor/docs/PATH_CONFUSION_PREVENTION.md
  • Agent Context: conductor/src/agent-manager.ts (lines 467-493, 575-580)
  • Path Validation: conductor/src/web-server.ts (handleValidatePath method)
  • Dashboard Alert: conductor/static/index.html and app.js
  • Config UI: conductor/static/config.html and config.js

Affected Pull Requests

  • Multiple PRs: Closed - Contaminated with wrong feedback-service path
  • Clean PR: ✅ Merged - Clean extraction of conductor work

Sign-Off

Incident Commander: Claude (AI Assistant) Incident Reviewer: System Administrator Date Closed: October 10, 2025 Status: Resolved with comprehensive prevention measures

Post-Incident Actions:

  • [x] Root cause identified
  • [x] Immediate fix applied
  • [x] Prevention measures implemented
  • [x] Documentation updated
  • [x] Incident report completed
  • [ ] Team communication (if applicable)
  • [ ] Post-mortem review scheduled (if applicable)

Part of the Zeron Platform | Built with VitePress