CLEO Metrics and Validation System Specification
Version: 1.0.0
Status: ACTIVE
Created: 2026-02-01
CRITICAL: This specification documents how CLEO proves its value through measurable metrics.
Any changes to the metrics system MUST update this specification.
Executive Summary
CLEO’s value proposition:
- Saves context tokens - Subagent+manifest architecture uses less context than direct implementation
- Prevents hallucinations - Protocol validation catches violations before completion
- Enables skill composition - Multiple skills with progressive disclosure
This specification documents how these claims are measured and proven.
Part 1: Architecture Overview
1.1 Metrics Components
┌─────────────────────────────────────────────────────────────────────┐
│ CLEO METRICS ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌────────────────┐ │
│ │ Token Tracking │ │ Validation │ │ Skill Metrics │ │
│ │ │ │ │ │ │ │
│ │ • OTel capture │ │ • Protocol │ │ • Composition │ │
│ │ • Session delta │ │ validators │ │ • Token budget │ │
│ │ • Estimation │ │ • Manifest │ │ • Progressive │ │
│ │ fallback │ │ validation │ │ disclosure │ │
│ └────────┬────────┘ └────────┬────────┘ └───────┬────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ METRICS STORAGE │ │
│ │ .cleo/metrics/ │ │
│ │ ├── TOKEN_USAGE.jsonl # Token events │ │
│ │ ├── COMPLIANCE.jsonl # Validation results │ │
│ │ ├── SESSIONS.jsonl # Session metrics │ │
│ │ └── otel/ # OpenTelemetry data │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
1.2 Library Files
| Library | Purpose | Key Functions |
|---|
lib/otel-integration.sh | Capture actual Claude Code tokens | get_session_tokens, compare_sessions |
lib/token-estimation.sh | Fallback estimation when OTel unavailable | estimate_tokens, track_file_read |
lib/manifest-validation.sh | Real manifest entry validation | validate_and_log, find_manifest_entry |
lib/protocol-validation.sh | Protocol-specific validators | validate_*_protocol (9 validators) |
Part 2: Token Tracking
2.1 OpenTelemetry Integration (Primary Method)
Claude Code exposes actual token usage via OpenTelemetry telemetry.
Enabling OTel
# Enable Claude Code telemetry
export CLAUDE_CODE_ENABLE_TELEMETRY=1
# Option 1: Console output (debugging)
export OTEL_METRICS_EXPORTER=console
export OTEL_METRIC_EXPORT_INTERVAL=5000
# Option 2: File output (CLEO integration)
export OTEL_METRICS_EXPORTER=otlp
export OTEL_EXPORTER_OTLP_PROTOCOL=http/json
export OTEL_EXPORTER_OTLP_ENDPOINT=file://.cleo/metrics/otel/
Available Metrics
| Metric | Attributes | Description |
|---|
claude_code.token.usage | type, model | Aggregated token counts |
claude_code.api_request | input_tokens, output_tokens | Per-request details |
2.2 Proving Token Savings
The manifest system saves tokens by reading summaries instead of full files:
# Compare manifest vs full file approach
compare_manifest_vs_full 10 # 10 manifest entries read
# Output:
# {
# "manifest_entries_read": 10,
# "manifest_tokens": 2000,
# "full_file_equivalent": 20000,
# "tokens_saved": 18000,
# "savings_percent": 90
# }
Part 3: Protocol Validation
3.1 Validators
CLEO has 9 protocol validators:
| Validator | Exit Code | Validates |
|---|
validate_research_protocol | 60 | Research output (no code mods, key_findings) |
validate_consensus_protocol | 61 | Voting matrix, confidence scores |
validate_specification_protocol | 62 | RFC 2119 keywords, version |
validate_decomposition_protocol | 63 | Sibling limits, clear descriptions |
validate_implementation_protocol | 64 | @task tags, code modifications |
validate_contribution_protocol | 65 | Attribution tags |
validate_release_protocol | 66 | Semver, changelog |
validate_validation_protocol | 68 | Test results, validation_result field |
validate_testing_protocol | 69/70 | BATS tests, pass rates |
3.2 Compliance Logging
All validations are logged to .cleo/metrics/COMPLIANCE.jsonl:
{
"timestamp": "2026-02-01T01:23:45Z",
"source_id": "T1234",
"source_type": "subagent",
"compliance": {
"compliance_pass_rate": 0.95,
"rule_adherence_score": 0.95,
"violation_count": 1,
"violation_severity": "warning",
"manifest_integrity": "valid"
}
}
Part 4: Multi-Skill Composition
4.1 Progressive Disclosure
Skills are loaded with different detail levels based on priority:
| Mode | Description | Token Usage |
|---|
| Full | Complete SKILL.md content | 100% |
| Progressive | Frontmatter + first section only | ~5-10% |
4.2 Token Savings from Progressive Loading
Primary skill: 3179 tokens (full)
Secondary skills: 200 + 180 = 380 tokens (progressive)
If all were full: ~9500 tokens
Savings: ~64% from progressive disclosure alone.
Part 5: Metrics Dashboard
5.1 Command: cleo metrics value
=== CLEO Value Metrics ===
TOKEN EFFICIENCY (last 7 days):
┌────────────────────────────────────────────────────┐
│ Manifest reads: 12,450 tokens │
│ If full files: 145,000 tokens (estimated) │
│ SAVINGS: 132,550 tokens (91%) │
└────────────────────────────────────────────────────┘
VALIDATION IMPACT:
┌────────────────────────────────────────────────────┐
│ Total validations: 47 │
│ Violations caught: 8 (17%) │
│ By type: │
│ - Research modified code: 3 │
│ - Missing key_findings: 2 │
│ - Invalid status: 3 │
└────────────────────────────────────────────────────┘
References