CLEO Metrics Value Proof Specification
Version: 0.1.0 Status: DRAFT Created: 2026-02-01 Epic: T2833Problem Statement
CLEO claims to save context tokens and prevent hallucinations, but there is no mechanism to prove these claims:- Token consumption: All metrics show
0because there’s no data source - Manifest savings: Theory says manifest reads save tokens, but no measurement
- Hallucination prevention: Validators exist but no before/after comparison
- Skill composition: Single skill only, no progressive loading measurement
Goals
- Measure actual token usage - Before and after CLEO
- Prove manifest efficiency - Full file vs manifest-only reads
- Track validation impact - Violations caught, fixes applied
- Enable skill composition - Multiple skills with progressive disclosure
Part 1: Token Consumption Tracking
The Solution: OpenTelemetry Integration
Claude Code DOES track actual tokens via OpenTelemetry telemetry:input_tokens- Actual input tokens consumedoutput_tokens- Actual output tokens generatedcache_read_tokens- Tokens read from cachecache_creation_tokens- Tokens used to create cache
How to Enable Telemetry
Option 1: Console Export (development)Part 2: Manifest Token Savings
Hypothesis
Reading manifest summaries instead of full agent output files saves significant tokens.Measurement Approach
Expected Results
| Approach | Tokens per Entry | 10 Entries |
|---|---|---|
| Manifest only | ~200 | ~2,000 |
| Full file | ~2,000 | ~20,000 |
| Savings | 90% | 18,000 |
Part 3: Validation Impact Measurement
Tracking Violations Caught
Value Demonstration
- Violations caught: Count per period
- Prevention rate: Violations / Total completions
- By protocol: Which protocols catch most issues
Part 4: A/B Testing Framework
Test Scenarios
| Scenario | Description |
|---|---|
| Baseline | Direct implementation without CLEO |
| With CLEO | Orchestrator + subagents + manifest |
Metrics to Compare
| Metric | Baseline | With CLEO | Expected |
|---|---|---|---|
| Total tokens | Higher | Lower | -50%+ |
| Files read | Many | Few | -80%+ |
| Validation failures | N/A | Caught | >0 |
Implementation Status
- OpenTelemetry integration design
- Token estimation fallback
- Manifest validation logging
- A/B testing framework
- Metrics dashboard
References
- METRICS-VALIDATION: Complete metrics system
- TOKEN-TRACKING-ARCHITECTURE: Tracking tiers
