CLEO Native Orchestration System Specification
Version: 2.2.0 Status: Specification Created: 2025-12-30 Updated: 2025-12-31 Author: Architecture Team Target: CLEO v0.42.0+Part 1: Preamble
1.1 Purpose
This specification defines CLEO’s native tmux-based multi-agent orchestration system. The system enables parallel execution of tasks across multiple Claude Code agents while maintaining state consistency, scope isolation, and deterministic coordination.1.2 Authority
This specification is AUTHORITATIVE for:- tmux session lifecycle management
- Agent spawning and environment injection
- Wave-based dependency execution
- Completion detection via Stop hooks
- Heartbeat monitoring and stale agent detection
- PRIME-ARCHITECTURE-SPEC.md for PRIME/Session Agent/Subagent hierarchy
- SOLID-PROMPTING-SYSTEM-SPEC.md for agent prompt templates
- MULTI-SESSION-SPEC.md for session scope binding
- IMPLEMENTATION-ORCHESTRATION-SPEC.md for 7-agent pipeline
1.3 RFC 2119 Conformance
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” are interpreted as described in RFC 2119.1.4 Decision Summary
Decision: Build CLEO-native orchestration inspired by claude_code_agent_farm patterns. Alternatives Evaluated:| Option | License | Verdict |
|---|---|---|
| Claude Squad | AGPL-3.0 | ❌ Rejected - copyleft incompatible with CLEO’s MIT license |
| claude_code_agent_farm | MIT | ✅ Inspiration source - compatible license, good patterns |
| Build from scratch | N/A | ❌ Rejected - reinventing proven patterns |
- Licensing: Agent Farm’s MIT license is compatible with CLEO; Claude Squad’s AGPL-3.0 would require CLEO to adopt copyleft
- Integration Depth: CLEO requires deep integration with
sessions.jsonand existing session system - API Optimization: Event-driven completion (Stop hooks) is more efficient than Agent Farm’s polling approach
- Scope Management: Pre-assigned disjoint scopes eliminate runtime lock contention
Part 2: Design Philosophy
2.1 Core Principles
Build CLEO-native tmux orchestration optimized for CLEO’s existing architecture and LLM-agent-first principles:| Principle | Implementation |
|---|---|
| Automation-First | CLI/scriptable by design, not TUI with automation bolted on |
| Event-Driven | Stop hooks for completion vs polling (reduces latency) |
| Scope Pre-Assignment | No runtime lock contention (disjoint scopes assigned before spawn) |
| Wave-Based Spawning | Dependency-aware vs spawn-all-immediately |
| Single State File | Use existing sessions.json vs multiple coordination files |
| Context Minimization | Scope-filtered injection reduces token usage |
2.2 Patterns Adopted from Agent Farm
Inspired by claude_code_agent_farm (MIT License):- tmux pane-based agent isolation - Each agent runs in isolated tmux pane
- JSON state file for monitoring - External tools can observe orchestration state
- Heartbeat-based health detection - Detect stalled agents via timestamp comparison
- Prompt-based agent coordination - Agents self-coordinate via injected instructions
2.3 CLEO Modernizations
Where CLEO improves on existing patterns:| Aspect | Agent Farm Approach | CLEO Modernization |
|---|---|---|
| Completion Detection | Polling-based file checks | Event-driven Stop hooks |
| Task Claiming | Runtime lock file creation | Pre-assigned disjoint scopes |
| Agent Spawning | All agents immediately + stagger | Dependency-aware wave spawning |
| State Management | 4 coordination files | Single sessions.json |
| Context Injection | Full task list per agent | Scope-filtered task subset |
Part 3: Architecture
3.1 System Overview
3.2 Layer Placement
3.3 File Locations
| File | Purpose |
|---|---|
lib/orchestrator.sh | Core orchestration functions (Layer 3) |
scripts/orchestrate.sh | CLI entry point |
.claude/hooks/orchestrator-stop.yaml | Stop hook for completion notification |
templates/session-agent-prompt.md | Base session agent prompt |
templates/agents/*.md | 7 pipeline agent prompts |
.cleo/orchestration/events/ | Completion event files |
Part 4: Core Functions
4.1 lib/orchestrator.sh
4.2 scripts/orchestrate.sh
Part 5: Wave-Based Execution
5.1 Dependency Analysis
Tasks are grouped into waves based on their dependencies:5.2 Example Wave Assignment
Given T1114 with dependencies:5.3 Wave Progression
Part 6: Environment Injection
6.1 CLEO_SESSION Binding
Each spawned agent receives environment variables:6.2 Session Resolution Priority
Fromlib/session.sh:
Part 7: Completion Detection
7.1 Stop Hook
7.2 Event Processing
The orchestrator monitors.cleo/orchestration/events/ for completion files:
Part 8: Heartbeat Monitoring
8.1 Heartbeat Updates
Agents update their heartbeat viasessions.json:
8.2 Stale Detection
Part 9: API Optimization
9.1 Context Minimization Strategies
| Strategy | Savings | Implementation |
|---|---|---|
| Scope-filtered task list | 50-80% | Only inject tasks in agent’s scope |
| Task-level injection | 30-50% | Pass task.description, not full todo.json |
| Shared CLAUDE.md | 40-60% | Project context same for all, don’t repeat |
| Lazy file loading | 20-30% | Agent reads files on-demand, not upfront |
9.2 Subagent vs Session Decision
Part 10: Configuration
10.1 config.json Schema Extension
10.2 Agent Environment Variables
Each spawned agent receives two categories of environment variables:10.2.1 Claude Code Optimization (from agentProgram.env)
| Variable | Value | Purpose |
|---|---|---|
CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC | true | Disables telemetry, error reporting, auto-updates |
ENABLE_BACKGROUND_TASKS | true | Enables background task functionality (experimental) |
FORCE_AUTO_BACKGROUND_TASKS | true | Auto-backgrounds long-running commands (experimental) |
CLAUDE_CODE_ENABLE_UNIFIED_READ_TOOL | true | Unified file reading including Jupyter (experimental) |
10.2.2 CLEO Orchestration Context (injected per-agent)
| Variable | Example | Purpose |
|---|---|---|
CLEO_SESSION | session_20251231_abc123 | Session ID for CLEO operations |
CLEO_SCOPE | subtree:T998.1 | Agent’s assigned task scope |
CLEO_AGENT_ID | agent-0 | Unique agent identifier |
CLEO_ORCHESTRATION_ID | orch_xyz | Parent orchestration run ID |
CLEO_WAVE | 1 | Current wave number |
CLEO_PROJECT_ROOT | /path/to/project | Project root for hook scripts |
10.2.3 Spawn Command Construction
The orchestrator constructs the full spawn command as:10.3 Exit Codes (50-59)
| Code | Constant | Meaning |
|---|---|---|
| 50 | EXIT_ORCH_FAILED | Orchestration failed to start |
| 51 | EXIT_EPIC_NOT_FOUND | Epic task not found |
| 52 | EXIT_SCOPE_CONFLICT | Scope overlap detected |
| 53 | EXIT_TMUX_FAILED | Tmux session creation failed |
| 54 | EXIT_SPAWN_FAILED | Agent spawn failed |
| 55 | EXIT_WAVE_FAILED | Wave progression failed |
| 56 | EXIT_TIMEOUT | Agent timeout exceeded |
| 57 | EXIT_HOOK_FAILED | Stop hook notification failed |
Part 11: Implementation Checklist
Phase 1: Foundation (T1116)
- Create
lib/orchestrator.shskeleton with source guards - Implement
orchestrate_start()(single wave, no deps) - Implement
orchestrate_status()for JSON output - Implement
orchestrate_stop()for cleanup - Add tmux lifecycle functions
- Write BATS unit tests
Phase 2: CLI and Integration (T1117, T1118)
- Create
scripts/orchestrate.shCLI entry point - Update
lib/session.shfor CLEO_SESSION priority - Wire environment variable injection
- Add integration tests
Phase 3: Event System (T1119, T1120)
- Create base session agent prompt template
- Implement Stop hook for completion notification
- Wire
handle_agent_completion()to hook output - Test event-driven completion flow
Phase 4: Wave Dependencies
- Implement
compute_dependency_waves() - Implement
spawn_wave()for specific wave - Implement
get_next_wave()for progression - Add wave-based spawning tests
Phase 5: Pipeline Agents (T1124-T1130)
- Create 7 specialized agent prompt templates
- Validate templates against SOLID Prompting spec
- Test each template with mock task execution
Phase 6: Monitoring and Docs (T1121)
- Implement heartbeat monitoring
- Implement stale agent detection
- Write comprehensive documentation
- Update related specs
Part 12: Success Criteria
| Criterion | Validation |
|---|---|
| 3+ agents spawn on disjoint scopes | orchestrate start T998 works |
| Wave progression | Wave N+1 spawns after Wave N completes |
| Stop hook triggers | Completion notification < 5s |
| No scope conflicts | Agents cannot focus same task |
| Heartbeat detection | Stale agents detected within 2 * timeout |
| All prompts SOLID-compliant | Template validation passes |
| Exit codes documented | All 50-59 codes in exit-codes.sh |
Appendix A: Related Specifications
- PRIME-ARCHITECTURE-SPEC.md - Three-tier agent hierarchy
- SOLID-PROMPTING-SYSTEM-SPEC.md - Prompt template design
- MULTI-SESSION-SPEC.md - Session scope binding
- IMPLEMENTATION-ORCHESTRATION-SPEC.md - 7-agent pipeline
- CLAUDE-CLI-IMPROVED.md - Claude Code CLI optimizations and shell aliases
Appendix B: Revision History
| Version | Date | Changes |
|---|---|---|
| 1.0.0 | 2025-12-30 | Initial Claude Squad analysis |
| 2.0.0 | 2025-12-31 | Complete rewrite: CLEO Native with Agent Farm patterns |
| 2.1.0 | 2025-12-31 | Added Part 1.4 Decision Summary with licensing rationale |
| 2.2.0 | 2025-12-31 | Expanded agentProgram config with env vars, added 10.2 Agent Environment Variables |
Specification v2.2.0 - CLEO Native Orchestration System Applicable to: CLEO v0.42.0+ Last updated: 2025-12-31
