Skip to main content

CLEO Metrics and Validation System Specification

Version: 1.0.0 Status: ACTIVE Created: 2026-02-01
CRITICAL: This specification documents how CLEO proves its value through measurable metrics. Any changes to the metrics system MUST update this specification.

Executive Summary

CLEO’s value proposition:
  1. Saves context tokens - Subagent+manifest architecture uses less context than direct implementation
  2. Prevents hallucinations - Protocol validation catches violations before completion
  3. Enables skill composition - Multiple skills with progressive disclosure
This specification documents how these claims are measured and proven.

Part 1: Architecture Overview

1.1 Metrics Components

┌─────────────────────────────────────────────────────────────────────┐
│                    CLEO METRICS ARCHITECTURE                        │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  ┌─────────────────┐    ┌─────────────────┐    ┌────────────────┐  │
│  │ Token Tracking  │    │   Validation    │    │ Skill Metrics  │  │
│  │                 │    │                 │    │                │  │
│  │ • OTel capture  │    │ • Protocol      │    │ • Composition  │  │
│  │ • Session delta │    │   validators    │    │ • Token budget │  │
│  │ • Estimation    │    │ • Manifest      │    │ • Progressive  │  │
│  │   fallback      │    │   validation    │    │   disclosure   │  │
│  └────────┬────────┘    └────────┬────────┘    └───────┬────────┘  │
│           │                      │                     │           │
│           ▼                      ▼                     ▼           │
│  ┌──────────────────────────────────────────────────────────────┐  │
│  │                    METRICS STORAGE                            │  │
│  │  .cleo/metrics/                                               │  │
│  │  ├── TOKEN_USAGE.jsonl    # Token events                     │  │
│  │  ├── COMPLIANCE.jsonl     # Validation results               │  │
│  │  ├── SESSIONS.jsonl       # Session metrics                  │  │
│  │  └── otel/                # OpenTelemetry data               │  │
│  └──────────────────────────────────────────────────────────────┘  │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

1.2 Library Files

LibraryPurposeKey Functions
lib/otel-integration.shCapture actual Claude Code tokensget_session_tokens, compare_sessions
lib/token-estimation.shFallback estimation when OTel unavailableestimate_tokens, track_file_read
lib/manifest-validation.shReal manifest entry validationvalidate_and_log, find_manifest_entry
lib/protocol-validation.shProtocol-specific validatorsvalidate_*_protocol (9 validators)

Part 2: Token Tracking

2.1 OpenTelemetry Integration (Primary Method)

Claude Code exposes actual token usage via OpenTelemetry telemetry.

Enabling OTel

# Enable Claude Code telemetry
export CLAUDE_CODE_ENABLE_TELEMETRY=1

# Option 1: Console output (debugging)
export OTEL_METRICS_EXPORTER=console
export OTEL_METRIC_EXPORT_INTERVAL=5000

# Option 2: File output (CLEO integration)
export OTEL_METRICS_EXPORTER=otlp
export OTEL_EXPORTER_OTLP_PROTOCOL=http/json
export OTEL_EXPORTER_OTLP_ENDPOINT=file://.cleo/metrics/otel/

Available Metrics

MetricAttributesDescription
claude_code.token.usagetype, modelAggregated token counts
claude_code.api_requestinput_tokens, output_tokensPer-request details

2.2 Proving Token Savings

The manifest system saves tokens by reading summaries instead of full files:
# Compare manifest vs full file approach
compare_manifest_vs_full 10  # 10 manifest entries read

# Output:
# {
#   "manifest_entries_read": 10,
#   "manifest_tokens": 2000,
#   "full_file_equivalent": 20000,
#   "tokens_saved": 18000,
#   "savings_percent": 90
# }

Part 3: Protocol Validation

3.1 Validators

CLEO has 9 protocol validators:
ValidatorExit CodeValidates
validate_research_protocol60Research output (no code mods, key_findings)
validate_consensus_protocol61Voting matrix, confidence scores
validate_specification_protocol62RFC 2119 keywords, version
validate_decomposition_protocol63Sibling limits, clear descriptions
validate_implementation_protocol64@task tags, code modifications
validate_contribution_protocol65Attribution tags
validate_release_protocol66Semver, changelog
validate_validation_protocol68Test results, validation_result field
validate_testing_protocol69/70BATS tests, pass rates

3.2 Compliance Logging

All validations are logged to .cleo/metrics/COMPLIANCE.jsonl:
{
  "timestamp": "2026-02-01T01:23:45Z",
  "source_id": "T1234",
  "source_type": "subagent",
  "compliance": {
    "compliance_pass_rate": 0.95,
    "rule_adherence_score": 0.95,
    "violation_count": 1,
    "violation_severity": "warning",
    "manifest_integrity": "valid"
  }
}

Part 4: Multi-Skill Composition

4.1 Progressive Disclosure

Skills are loaded with different detail levels based on priority:
ModeDescriptionToken Usage
FullComplete SKILL.md content100%
ProgressiveFrontmatter + first section only~5-10%

4.2 Token Savings from Progressive Loading

Primary skill: 3179 tokens (full) Secondary skills: 200 + 180 = 380 tokens (progressive) If all were full: ~9500 tokens Savings: ~64% from progressive disclosure alone.

Part 5: Metrics Dashboard

5.1 Command: cleo metrics value

=== CLEO Value Metrics ===

TOKEN EFFICIENCY (last 7 days):
  ┌────────────────────────────────────────────────────┐
  │ Manifest reads:     12,450 tokens                  │
  │ If full files:     145,000 tokens (estimated)      │
  │ SAVINGS:           132,550 tokens (91%)            │
  └────────────────────────────────────────────────────┘

VALIDATION IMPACT:
  ┌────────────────────────────────────────────────────┐
  │ Total validations:  47                             │
  │ Violations caught:   8 (17%)                       │
  │ By type:                                           │
  │   - Research modified code: 3                      │
  │   - Missing key_findings: 2                        │
  │   - Invalid status: 3                              │
  └────────────────────────────────────────────────────┘

References