Skip to main content

File Locking & Concurrency Safety Specification

Version: 1.0.0 Status: ACTIVE Effective: v0.9.0+ Last Updated: 2025-12-19

RFC 2119 Conformance

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals.

Preamble

This specification defines the file locking and concurrency safety requirements for cleo, ensuring data integrity when multiple processes access shared JSON files simultaneously.
Authority: This specification is AUTHORITATIVE for file locking behavior. Implementation status is tracked separately in FILE-LOCKING-IMPLEMENTATION-REPORT.md.

Background

Without file locking, concurrent operations can cause:
  • Race conditions: Multiple processes read the same counter, generate duplicate IDs
  • Data corruption: Simultaneous writes produce invalid JSON
  • Lost updates: One process overwrites another’s changes
  • System failure: Corrupted JSON causes all operations to fail

Executive Summary

The file locking system provides:
  • Exclusive file locking: Via flock system call
  • Automatic locking: Built into save_json() and atomic_write()
  • Manual locking API: lock_file() and unlock_file() for custom operations
  • Timeout protection: Configurable timeout prevents deadlocks
  • Error recovery: Automatic lock release on errors and signals

Part 1: Core Requirements

1.1 Atomic Write Operations

All write operations to JSON files MUST be atomic:
  1. Write to temporary file ({file}.tmp)
  2. Validate content before commit
  3. Backup original file
  4. Atomically rename temp to target (single mv operation)

1.2 Exclusive Locking

All write operations MUST acquire an exclusive lock before modifying files:
OperationLock Required
Read-modify-writeMUST
Append operationMUST
File creationSHOULD
Read-only operationMAY (advisory)

1.3 Lock Scope

Locks MUST be file-level (not directory-level):
  • Locking todo.json MUST NOT block operations on todo-log.jsonl
  • Multiple files MAY be locked simultaneously by the same process
  • Different processes MUST wait for locks on the same file

Part 2: Locking Mechanism

2.1 Lock Implementation

The system MUST use flock for file locking:
flock -w <timeout> -x <fd>
ParameterRequirement
Lock typeExclusive (-x)
BlockingYes, with timeout
ScopePer-process
MechanismFile descriptor

2.2 Lock Files

  • Lock files MUST be created at {file}.lock
  • Lock files MUST be created automatically if missing
  • Lock files MAY persist after operations (the lock itself is released when FD closes)
  • Lock files MUST NOT be used as data files

2.3 File Descriptors

  • Lock file descriptors MUST use range 200-210 to avoid conflicts
  • The system MUST track which FDs are in use
  • FDs MUST be closed when locks are released

Part 3: Timeout Behavior

3.1 Default Timeout

  • Default lock timeout MUST be 30 seconds
  • Timeout SHOULD be configurable per operation

3.2 Timeout Handling

When a lock timeout occurs:
  • The operation MUST fail with error code E_LOCK_FAILED (8)
  • The error message MUST indicate timeout duration
  • The error message SHOULD suggest checking for stuck processes

3.3 Deadlock Prevention

  • A process MUST NOT attempt to lock a file it already holds
  • Nested locks on the same file MUST be detected and prevented
  • Operations SHOULD minimize time spent holding locks

Part 4: Error Recovery

4.1 Signal Handling

Locks MUST be released on:
  • Normal exit (EXIT trap)
  • Error conditions (ERR trap)
  • Interrupt signals (INT, TERM traps)

4.2 Trap Pattern

trap "unlock_file '$lock_fd'; cleanup" EXIT ERR INT TERM

4.3 Partial Failure

If an operation fails after acquiring a lock:
  • The lock MUST be released
  • Temporary files MUST be cleaned up
  • Original file MUST remain unchanged (rollback)

Part 5: API Requirements

5.1 lock_file Function

lock_file <file_path> <fd_variable_name> [timeout_seconds]
ParameterTypeRequiredDescription
file_pathstringYesPath to file to lock
fd_variable_namestringYesVariable to store FD number
timeout_secondsintegerNoTimeout (default: 30)
Returns:
  • 0 on success
  • E_LOCK_FAILED (8) on timeout or failure

5.2 unlock_file Function

unlock_file [file_descriptor]
ParameterTypeRequiredDescription
file_descriptorintegerNoFD to unlock (uses LOCK_FD if omitted)
Behavior:
  • Safe to call without holding lock
  • Safe to call multiple times
  • MUST release flock
  • MUST close file descriptor

5.3 atomic_write Function

echo "$content" | atomic_write "$file"
Behavior:
  • MUST acquire lock before operations
  • MUST write to temp file first
  • MUST validate content before commit
  • MUST backup original file
  • MUST release lock after completion
  • MUST release lock on error

5.4 save_json Function

save_json "$file" "$json_content"
Behavior:
  • MUST validate JSON syntax
  • MUST delegate to atomic_write()
  • MUST inherit locking behavior

Part 6: Script Requirements

6.1 Write-Capable Scripts

Scripts that modify JSON files MUST:
  • Source lib/file-ops.sh
  • Use save_json() for JSON writes, OR
  • Use lock_file()/unlock_file() for custom operations

6.2 Script Classification

PriorityScriptsRequirement
P0 (Critical)add.sh, update.sh, complete.shMUST use locking
P1 (Important)archive.sh, focus.sh, session.sh, migrate.shMUST use locking
P2 (Lower)log.sh, init.shSHOULD use locking

6.3 Read-Only Scripts

Scripts that only read files:
  • MAY proceed without locks
  • SHOULD handle concurrent modification gracefully
  • MUST NOT cache file contents across operations

Part 7: Error Codes

7.1 Exit Codes

CodeConstantMeaning
0E_SUCCESSOperation completed
8E_LOCK_FAILEDLock acquisition failed

7.2 Error Messages

ScenarioMessage Pattern
Timeout"Failed to acquire lock (timeout after Xs)"
FD exhaustion"File descriptor X already in use"
Lock held"Cannot acquire lock on $file (another process may be active)"

Part 8: Performance

8.1 Performance Targets

MetricRequirement
Sequential write overhead< 5%
Lock acquisition< 100ms typically
10 sequential locked writes< 5 seconds

8.2 Optimization Guidelines

  • Minimize time spent holding locks
  • Release locks before non-file operations
  • Use file-level locks (not global)

Part 9: Platform Compatibility

9.1 Required

DependencyVersionNotes
flockAnyPart of util-linux
Bash4.0+For proper FD handling
File descriptors200-210 availableCheck ulimit -n

9.2 Supported Platforms

  • Linux (all distributions)
  • macOS (with flock from util-linux or homebrew)
  • WSL (Windows Subsystem for Linux)
  • BSD systems with flock support

9.3 Unsupported

  • NFS and network filesystems (advisory locks only)
  • Windows native (use WSL)

Part 10: Testing Requirements

10.1 Unit Tests

Tests MUST verify:
  • Lock acquisition and release
  • Timeout behavior
  • Concurrent lock attempts
  • Lock release on error
  • Sequential lock reuse

10.2 Integration Tests

Tests MUST verify:
  • Concurrent add operations produce unique IDs
  • Concurrent writes don’t corrupt JSON
  • File integrity after interrupted operations

10.3 Test Location

  • Unit tests: tests/unit/file-locking.bats
  • Integration: tests/integration/concurrent-*.bats

DocumentRelationship
LLM-AGENT-FIRST-SPEC.mdData integrity requirements
FILE-LOCKING-IMPLEMENTATION-REPORT.mdTracks implementation status
lib/file-ops.shCore implementation

Appendix A: Race Condition Scenario

Before Locking (Vulnerable)

Process 1: read(counter=0) → generate(T001) → write ─┐
Process 2: read(counter=0) → generate(T001) → write ─┼─► CORRUPTION
Process 3: read(counter=0) → generate(T001) → write ─┘

After Locking (Safe)

Process 1: lock → read(counter=0) → write(counter=1) → unlock
Process 2: [wait] → lock → read(counter=1) → write(counter=2) → unlock
Process 3: [wait] → [wait] → lock → read(counter=2) → write(counter=3) → unlock

Appendix B: Usage Examples

# Locking handled automatically
echo "$json" | save_json "$file"

Manual Locking (Advanced)

lock_fd=""
if ! lock_file "$file" lock_fd 30; then
    echo "Failed to acquire lock" >&2
    exit 8
fi

trap "unlock_file '$lock_fd'" EXIT

# Critical section - exclusive access
current=$(cat "$file")
modified=$(echo "$current" | jq '.counter += 1')
echo "$modified" > "$file"

unlock_file "$lock_fd"
trap - EXIT

Appendix C: Common Patterns

C.1 Simple Read-Modify-Write

lock_fd=""
lock_file "$file" lock_fd

data=$(cat "$file")
modified=$(process "$data")
echo "$modified" > "$file"

unlock_file "$lock_fd"

C.2 Multiple Operations Under Single Lock

lock_fd=""
lock_file "$file" lock_fd
trap "unlock_file '$lock_fd'" EXIT

# Multiple operations all protected
validate_file "$file"
backup_file "$file"
modify_file "$file"
check_result "$file"

unlock_file "$lock_fd"
trap - EXIT

C.3 Conditional Locking

if need_exclusive_access; then
    lock_fd=""
    lock_file "$file" lock_fd
    trap "unlock_file '$lock_fd'" EXIT
fi

# Do work (locked if needed)
process "$file"

if [[ -n "$lock_fd" ]]; then
    unlock_file "$lock_fd"
    trap - EXIT
fi

Appendix D: Best Practices

D.1 DO

PracticeRationale
Use save_json() when possibleAutomatic locking, validation, backup
Set appropriate timeoutsDefault 30s is usually sufficient
Use trap to ensure unlock on errorPrevents lock leaks on exceptions
Check lock_file return valueHandle lock failures gracefully
Minimize time spent holding lockReduces contention

D.2 DO NOT

Anti-PatternConsequence
Hold locks during long operationsBlocks other processes unnecessarily
Lock the same file twice in same processDeadlock
Forget to unlockLock leak, blocks other processes
Assume locks work across NFSAdvisory locks only on network filesystems
Use locks for read-only operationsUnnecessary overhead unless preventing writes

Appendix E: Troubleshooting

E.1 “Failed to acquire lock (timeout after Xs)”

Cause: Another process holds the lock Solutions:
  • Wait for other process to complete
  • Increase timeout if legitimate long operation
  • Check for deadlocks or stuck processes: lsof +D /path/to/.cleo/

E.2 “File descriptor X already in use”

Cause: Too many simultaneous locks or FD leak Solutions:
  • Unlock existing locks before acquiring new ones
  • Don’t nest locks on same file
  • Check for FD leaks (always unlock in trap)

E.3 Lock file remains after process exits

Behavior: This is normal - lock files persist at {file}.lock Explanation: The lock itself is released when the file descriptor closes. The lock file is just a marker. Cleanup: Lock files can be safely deleted anytime: rm .cleo/*.lock

Appendix F: Testing Templates

F.1 Unit Test Template (BATS)

@test "my operation handles concurrent access" {
    # Start concurrent operations
    for i in {1..3}; do
        (source lib/file-ops.sh; my_operation "$file") &
    done

    wait

    # Verify no corruption
    jq empty "$file"  # Valid JSON
    [ $(jq '.counter' "$file") -eq 3 ]  # Correct result
}

F.2 Manual Concurrency Test

# Terminal 1: Hold lock for 10 seconds
lock_fd=""; lock_file test.json lock_fd; sleep 10; unlock_file "$lock_fd"

# Terminal 2: Attempt lock with 5s timeout (will fail)
lock_file test.json lock_fd 5  # Should timeout after 5s

Appendix G: Version History

VersionDateChanges
1.0.02025-12-19Initial specification
1.1.02025-12-19Added patterns, best practices, troubleshooting appendices

End of Specification