CLEO Metrics and Validation System Specification

Version: 1.0.0 Status: ACTIVE Created: 2026-02-01

CRITICAL: This specification documents how CLEO proves its value through measurable metrics. Any changes to the metrics system MUST update this specification.

Executive Summary

CLEO’s value proposition:

Saves context tokens - Subagent+manifest architecture uses less context than direct implementation
Prevents hallucinations - Protocol validation catches violations before completion
Enables skill composition - Multiple skills with progressive disclosure

This specification documents how these claims are measured and proven.

Part 1: Architecture Overview

1.1 Metrics Components

┌─────────────────────────────────────────────────────────────────────┐
│                    CLEO METRICS ARCHITECTURE                        │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  ┌─────────────────┐    ┌─────────────────┐    ┌────────────────┐  │
│  │ Token Tracking  │    │   Validation    │    │ Skill Metrics  │  │
│  │                 │    │                 │    │                │  │
│  │ • OTel capture  │    │ • Protocol      │    │ • Composition  │  │
│  │ • Session delta │    │   validators    │    │ • Token budget │  │
│  │ • Estimation    │    │ • Manifest      │    │ • Progressive  │  │
│  │   fallback      │    │   validation    │    │   disclosure   │  │
│  └────────┬────────┘    └────────┬────────┘    └───────┬────────┘  │
│           │                      │                     │           │
│           ▼                      ▼                     ▼           │
│  ┌──────────────────────────────────────────────────────────────┐  │
│  │                    METRICS STORAGE                            │  │
│  │  .cleo/metrics/                                               │  │
│  │  ├── TOKEN_USAGE.jsonl    # Token events                     │  │
│  │  ├── COMPLIANCE.jsonl     # Validation results               │  │
│  │  ├── SESSIONS.jsonl       # Session metrics                  │  │
│  │  └── otel/                # OpenTelemetry data               │  │
│  └──────────────────────────────────────────────────────────────┘  │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

1.2 Library Files

Library	Purpose	Key Functions
`lib/otel-integration.sh`	Capture actual Claude Code tokens	`get_session_tokens`, `compare_sessions`
`lib/token-estimation.sh`	Fallback estimation when OTel unavailable	`estimate_tokens`, `track_file_read`
`lib/manifest-validation.sh`	Real manifest entry validation	`validate_and_log`, `find_manifest_entry`
`lib/protocol-validation.sh`	Protocol-specific validators	`validate_*_protocol` (9 validators)

Part 2: Token Tracking

2.1 OpenTelemetry Integration (Primary Method)

Claude Code exposes actual token usage via OpenTelemetry telemetry.

Enabling OTel

# Enable Claude Code telemetry
export CLAUDE_CODE_ENABLE_TELEMETRY=1

# Option 1: Console output (debugging)
export OTEL_METRICS_EXPORTER=console
export OTEL_METRIC_EXPORT_INTERVAL=5000

# Option 2: File output (CLEO integration)
export OTEL_METRICS_EXPORTER=otlp
export OTEL_EXPORTER_OTLP_PROTOCOL=http/json
export OTEL_EXPORTER_OTLP_ENDPOINT=file://.cleo/metrics/otel/

Available Metrics

Metric	Attributes	Description
`claude_code.token.usage`	`type`, `model`	Aggregated token counts
`claude_code.api_request`	`input_tokens`, `output_tokens`	Per-request details

2.2 Proving Token Savings

The manifest system saves tokens by reading summaries instead of full files:

# Compare manifest vs full file approach
compare_manifest_vs_full 10  # 10 manifest entries read

# Output:
# {
#   "manifest_entries_read": 10,
#   "manifest_tokens": 2000,
#   "full_file_equivalent": 20000,
#   "tokens_saved": 18000,
#   "savings_percent": 90
# }

Part 3: Protocol Validation

3.1 Validators

CLEO has 9 protocol validators:

Validator	Exit Code	Validates
`validate_research_protocol`	60	Research output (no code mods, key_findings)
`validate_consensus_protocol`	61	Voting matrix, confidence scores
`validate_specification_protocol`	62	RFC 2119 keywords, version
`validate_decomposition_protocol`	63	Sibling limits, clear descriptions
`validate_implementation_protocol`	64	@task tags, code modifications
`validate_contribution_protocol`	65	Attribution tags
`validate_release_protocol`	66	Semver, changelog
`validate_validation_protocol`	68	Test results, validation_result field
`validate_testing_protocol`	69/70	BATS tests, pass rates

3.2 Compliance Logging

All validations are logged to .cleo/metrics/COMPLIANCE.jsonl:

{
  "timestamp": "2026-02-01T01:23:45Z",
  "source_id": "T1234",
  "source_type": "subagent",
  "compliance": {
    "compliance_pass_rate": 0.95,
    "rule_adherence_score": 0.95,
    "violation_count": 1,
    "violation_severity": "warning",
    "manifest_integrity": "valid"
  }
}

Part 4: Multi-Skill Composition

4.1 Progressive Disclosure

Skills are loaded with different detail levels based on priority:

Mode	Description	Token Usage
Full	Complete SKILL.md content	100%
Progressive	Frontmatter + first section only	~5-10%

4.2 Token Savings from Progressive Loading

Primary skill: 3179 tokens (full) Secondary skills: 200 + 180 = 380 tokens (progressive) If all were full: ~9500 tokens Savings: ~64% from progressive disclosure alone.

Part 5: Metrics Dashboard

5.1 Command: `cleo metrics value`

=== CLEO Value Metrics ===

TOKEN EFFICIENCY (last 7 days):
  ┌────────────────────────────────────────────────────┐
  │ Manifest reads:     12,450 tokens                  │
  │ If full files:     145,000 tokens (estimated)      │
  │ SAVINGS:           132,550 tokens (91%)            │
  └────────────────────────────────────────────────────┘

VALIDATION IMPACT:
  ┌────────────────────────────────────────────────────┐
  │ Total validations:  47                             │
  │ Violations caught:   8 (17%)                       │
  │ By type:                                           │
  │   - Research modified code: 3                      │
  │   - Missing key_findings: 2                        │
  │   - Invalid status: 3                              │
  └────────────────────────────────────────────────────┘

References

METRICS-VALUE-PROOF: Initial design
PROTOCOL-STACK: 7 conditional protocols
TOKEN-TRACKING-ARCHITECTURE: Tracking tiers

Overview

System Architecture

RCSD-IVTR Protocol Stack

Protocol Definitions

Agent Protocols

Core Specifications

Strategic

MCP Server

Metrics & Validation

Protocol Enforcement

System Specifications

Feature Specifications

Release & Changelog

Schemas

Development Guides

Troubleshooting

Metrics Validation System

CLEO Metrics and Validation System Specification

Executive Summary

Part 1: Architecture Overview

1.1 Metrics Components

1.2 Library Files

Part 2: Token Tracking

2.1 OpenTelemetry Integration (Primary Method)

Enabling OTel

Available Metrics

2.2 Proving Token Savings

Part 3: Protocol Validation

3.1 Validators

3.2 Compliance Logging

Part 4: Multi-Skill Composition

4.1 Progressive Disclosure

4.2 Token Savings from Progressive Loading

Part 5: Metrics Dashboard

5.1 Command: `cleo metrics value`

References

Overview

System Architecture

RCSD-IVTR Protocol Stack

Protocol Definitions

Agent Protocols

Core Specifications

Strategic

MCP Server

Metrics & Validation

Protocol Enforcement

System Specifications

Feature Specifications

Release & Changelog

Schemas

Development Guides

Troubleshooting

​CLEO Metrics and Validation System Specification

​Executive Summary

​Part 1: Architecture Overview

​1.1 Metrics Components

​1.2 Library Files

​Part 2: Token Tracking

​2.1 OpenTelemetry Integration (Primary Method)

​Enabling OTel

​Available Metrics

​2.2 Proving Token Savings

​Part 3: Protocol Validation

​3.1 Validators

​3.2 Compliance Logging

​Part 4: Multi-Skill Composition

​4.1 Progressive Disclosure

​4.2 Token Savings from Progressive Loading

​Part 5: Metrics Dashboard

​5.1 Command: cleo metrics value

​References

CLEO Metrics and Validation System Specification

Executive Summary

Part 1: Architecture Overview

1.1 Metrics Components

1.2 Library Files

Part 2: Token Tracking

2.1 OpenTelemetry Integration (Primary Method)

Enabling OTel

Available Metrics

2.2 Proving Token Savings

Part 3: Protocol Validation

3.1 Validators

3.2 Compliance Logging

Part 4: Multi-Skill Composition

4.1 Progressive Disclosure

4.2 Token Savings from Progressive Loading

Part 5: Metrics Dashboard

5.1 Command: `cleo metrics value`

References