File Locking & Concurrency Safety Specification
Version: 1.0.0 Status: ACTIVE Effective: v0.9.0+ Last Updated: 2025-12-19RFC 2119 Conformance
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals.Preamble
This specification defines the file locking and concurrency safety requirements for cleo, ensuring data integrity when multiple processes access shared JSON files simultaneously.Authority: This specification is AUTHORITATIVE for file locking behavior. Implementation status is tracked separately in FILE-LOCKING-IMPLEMENTATION-REPORT.md.
Background
Without file locking, concurrent operations can cause:- Race conditions: Multiple processes read the same counter, generate duplicate IDs
- Data corruption: Simultaneous writes produce invalid JSON
- Lost updates: One process overwrites another’s changes
- System failure: Corrupted JSON causes all operations to fail
Executive Summary
The file locking system provides:- Exclusive file locking: Via
flocksystem call - Automatic locking: Built into
save_json()andatomic_write() - Manual locking API:
lock_file()andunlock_file()for custom operations - Timeout protection: Configurable timeout prevents deadlocks
- Error recovery: Automatic lock release on errors and signals
Part 1: Core Requirements
1.1 Atomic Write Operations
All write operations to JSON files MUST be atomic:- Write to temporary file (
{file}.tmp) - Validate content before commit
- Backup original file
- Atomically rename temp to target (single
mvoperation)
1.2 Exclusive Locking
All write operations MUST acquire an exclusive lock before modifying files:| Operation | Lock Required |
|---|---|
| Read-modify-write | MUST |
| Append operation | MUST |
| File creation | SHOULD |
| Read-only operation | MAY (advisory) |
1.3 Lock Scope
Locks MUST be file-level (not directory-level):- Locking
todo.jsonMUST NOT block operations ontodo-log.jsonl - Multiple files MAY be locked simultaneously by the same process
- Different processes MUST wait for locks on the same file
Part 2: Locking Mechanism
2.1 Lock Implementation
The system MUST useflock for file locking:
| Parameter | Requirement |
|---|---|
| Lock type | Exclusive (-x) |
| Blocking | Yes, with timeout |
| Scope | Per-process |
| Mechanism | File descriptor |
2.2 Lock Files
- Lock files MUST be created at
{file}.lock - Lock files MUST be created automatically if missing
- Lock files MAY persist after operations (the lock itself is released when FD closes)
- Lock files MUST NOT be used as data files
2.3 File Descriptors
- Lock file descriptors MUST use range 200-210 to avoid conflicts
- The system MUST track which FDs are in use
- FDs MUST be closed when locks are released
Part 3: Timeout Behavior
3.1 Default Timeout
- Default lock timeout MUST be 30 seconds
- Timeout SHOULD be configurable per operation
3.2 Timeout Handling
When a lock timeout occurs:- The operation MUST fail with error code
E_LOCK_FAILED(8) - The error message MUST indicate timeout duration
- The error message SHOULD suggest checking for stuck processes
3.3 Deadlock Prevention
- A process MUST NOT attempt to lock a file it already holds
- Nested locks on the same file MUST be detected and prevented
- Operations SHOULD minimize time spent holding locks
Part 4: Error Recovery
4.1 Signal Handling
Locks MUST be released on:- Normal exit (EXIT trap)
- Error conditions (ERR trap)
- Interrupt signals (INT, TERM traps)
4.2 Trap Pattern
4.3 Partial Failure
If an operation fails after acquiring a lock:- The lock MUST be released
- Temporary files MUST be cleaned up
- Original file MUST remain unchanged (rollback)
Part 5: API Requirements
5.1 lock_file Function
| Parameter | Type | Required | Description |
|---|---|---|---|
| file_path | string | Yes | Path to file to lock |
| fd_variable_name | string | Yes | Variable to store FD number |
| timeout_seconds | integer | No | Timeout (default: 30) |
0on successE_LOCK_FAILED(8) on timeout or failure
5.2 unlock_file Function
| Parameter | Type | Required | Description |
|---|---|---|---|
| file_descriptor | integer | No | FD to unlock (uses LOCK_FD if omitted) |
- Safe to call without holding lock
- Safe to call multiple times
- MUST release flock
- MUST close file descriptor
5.3 atomic_write Function
- MUST acquire lock before operations
- MUST write to temp file first
- MUST validate content before commit
- MUST backup original file
- MUST release lock after completion
- MUST release lock on error
5.4 save_json Function
- MUST validate JSON syntax
- MUST delegate to
atomic_write() - MUST inherit locking behavior
Part 6: Script Requirements
6.1 Write-Capable Scripts
Scripts that modify JSON files MUST:- Source
lib/file-ops.sh - Use
save_json()for JSON writes, OR - Use
lock_file()/unlock_file()for custom operations
6.2 Script Classification
| Priority | Scripts | Requirement |
|---|---|---|
| P0 (Critical) | add.sh, update.sh, complete.sh | MUST use locking |
| P1 (Important) | archive.sh, focus.sh, session.sh, migrate.sh | MUST use locking |
| P2 (Lower) | log.sh, init.sh | SHOULD use locking |
6.3 Read-Only Scripts
Scripts that only read files:- MAY proceed without locks
- SHOULD handle concurrent modification gracefully
- MUST NOT cache file contents across operations
Part 7: Error Codes
7.1 Exit Codes
| Code | Constant | Meaning |
|---|---|---|
| 0 | E_SUCCESS | Operation completed |
| 8 | E_LOCK_FAILED | Lock acquisition failed |
7.2 Error Messages
| Scenario | Message Pattern |
|---|---|
| Timeout | "Failed to acquire lock (timeout after Xs)" |
| FD exhaustion | "File descriptor X already in use" |
| Lock held | "Cannot acquire lock on $file (another process may be active)" |
Part 8: Performance
8.1 Performance Targets
| Metric | Requirement |
|---|---|
| Sequential write overhead | < 5% |
| Lock acquisition | < 100ms typically |
| 10 sequential locked writes | < 5 seconds |
8.2 Optimization Guidelines
- Minimize time spent holding locks
- Release locks before non-file operations
- Use file-level locks (not global)
Part 9: Platform Compatibility
9.1 Required
| Dependency | Version | Notes |
|---|---|---|
flock | Any | Part of util-linux |
| Bash | 4.0+ | For proper FD handling |
| File descriptors | 200-210 available | Check ulimit -n |
9.2 Supported Platforms
- Linux (all distributions)
- macOS (with flock from util-linux or homebrew)
- WSL (Windows Subsystem for Linux)
- BSD systems with flock support
9.3 Unsupported
- NFS and network filesystems (advisory locks only)
- Windows native (use WSL)
Part 10: Testing Requirements
10.1 Unit Tests
Tests MUST verify:- Lock acquisition and release
- Timeout behavior
- Concurrent lock attempts
- Lock release on error
- Sequential lock reuse
10.2 Integration Tests
Tests MUST verify:- Concurrent add operations produce unique IDs
- Concurrent writes don’t corrupt JSON
- File integrity after interrupted operations
10.3 Test Location
- Unit tests:
tests/unit/file-locking.bats - Integration:
tests/integration/concurrent-*.bats
Related Specifications
| Document | Relationship |
|---|---|
| LLM-AGENT-FIRST-SPEC.md | Data integrity requirements |
| FILE-LOCKING-IMPLEMENTATION-REPORT.md | Tracks implementation status |
lib/file-ops.sh | Core implementation |
Appendix A: Race Condition Scenario
Before Locking (Vulnerable)
After Locking (Safe)
Appendix B: Usage Examples
Automatic Locking (Recommended)
Manual Locking (Advanced)
Appendix C: Common Patterns
C.1 Simple Read-Modify-Write
C.2 Multiple Operations Under Single Lock
C.3 Conditional Locking
Appendix D: Best Practices
D.1 DO
| Practice | Rationale |
|---|---|
Use save_json() when possible | Automatic locking, validation, backup |
| Set appropriate timeouts | Default 30s is usually sufficient |
| Use trap to ensure unlock on error | Prevents lock leaks on exceptions |
| Check lock_file return value | Handle lock failures gracefully |
| Minimize time spent holding lock | Reduces contention |
D.2 DO NOT
| Anti-Pattern | Consequence |
|---|---|
| Hold locks during long operations | Blocks other processes unnecessarily |
| Lock the same file twice in same process | Deadlock |
| Forget to unlock | Lock leak, blocks other processes |
| Assume locks work across NFS | Advisory locks only on network filesystems |
| Use locks for read-only operations | Unnecessary overhead unless preventing writes |
Appendix E: Troubleshooting
E.1 “Failed to acquire lock (timeout after Xs)”
Cause: Another process holds the lock Solutions:- Wait for other process to complete
- Increase timeout if legitimate long operation
- Check for deadlocks or stuck processes:
lsof +D /path/to/.cleo/
E.2 “File descriptor X already in use”
Cause: Too many simultaneous locks or FD leak Solutions:- Unlock existing locks before acquiring new ones
- Don’t nest locks on same file
- Check for FD leaks (always unlock in trap)
E.3 Lock file remains after process exits
Behavior: This is normal - lock files persist at{file}.lock
Explanation: The lock itself is released when the file descriptor closes. The lock file is just a marker.
Cleanup: Lock files can be safely deleted anytime: rm .cleo/*.lock
Appendix F: Testing Templates
F.1 Unit Test Template (BATS)
F.2 Manual Concurrency Test
Appendix G: Version History
| Version | Date | Changes |
|---|---|---|
| 1.0.0 | 2025-12-19 | Initial specification |
| 1.1.0 | 2025-12-19 | Added patterns, best practices, troubleshooting appendices |
End of Specification
