Files

David Gwilliam 9e4d54a82e feat(tests): improve coverage to 56%, add benchmark regression tests

- Add EffectPlugin ABC with @abstractmethod decorators for interface enforcement
- Add runtime interface checking in discover_plugins() with issubclass()
- Add EffectContext factory with sensible defaults
- Standardize Display __init__ (remove redundant init in TerminalDisplay)
- Document effect behavior when ticker_height=0
- Evaluate legacy effects: document coexistence, no deprecation needed
- Research plugin patterns (VST, Python entry points)
- Fix pysixel dependency (removed broken dependency)

Test coverage improvements:
- Add DisplayRegistry tests
- Add MultiDisplay tests
- Add SixelDisplay tests
- Add controller._get_display tests
- Add effects controller command handling tests
- Add benchmark regression tests (@pytest.mark.benchmark)
- Add pytest marker for benchmark tests in pyproject.toml

Documentation updates:
- Update AGENTS.md with 56% coverage stats and effect plugin docs
- Update README.md with Sixel display mode and benchmark commands
- Add new modules to architecture section

2026-03-15 23:26:10 -07:00

7.7 KiB

Raw Blame History

Agent Development Guide

Development Environment

This project uses:

mise (mise.jdx.dev) - tool version manager and task runner
hk (hk.jdx.dev) - git hook manager
uv - fast Python package installer
ruff - linter and formatter
pytest - test runner

Setup

# Install dependencies
mise run install

# Or equivalently:
uv sync --all-extras   # includes mic, websocket, sixel support

Available Commands

mise run test           # Run tests
mise run test-v         # Run tests verbose
mise run test-cov       # Run tests with coverage report
mise run test-browser   # Run e2e browser tests (requires playwright)
mise run lint           # Run ruff linter
mise run lint-fix       # Run ruff with auto-fix
mise run format         # Run ruff formatter
mise run ci             # Full CI pipeline (topics-init + lint + test-cov)

Runtime Commands

mise run run            # Run mainline (terminal)
mise run run-poetry    # Run with poetry feed
mise run run-firehose  # Run in firehose mode
mise run run-websocket # Run with WebSocket display only
mise run run-sixel     # Run with Sixel graphics display
mise run run-both      # Run with both terminal and WebSocket
mise run run-client    # Run both + open browser
mise run cmd           # Run C&C command interface

Git Hooks

At the start of every agent session, verify hooks are installed:

ls -la .git/hooks/pre-commit

If hooks are not installed, install them with:

hk init --mise
mise run pre-commit

IMPORTANT: Always review the hk documentation before modifying hk.pkl:

The project uses hk configured in hk.pkl:

pre-commit: runs ruff-format and ruff (with auto-fix)
pre-push: runs ruff check + benchmark hook

Benchmark Runner

Run performance benchmarks:

mise run benchmark           # Run all benchmarks (text output)
mise run benchmark-json     # Run benchmarks (JSON output)
mise run benchmark-report   # Run benchmarks (Markdown report)

Benchmark Commands

# Run benchmarks
uv run python -m engine.benchmark

# Run with specific displays/effects
uv run python -m engine.benchmark --displays null,terminal --effects fade,glitch

# Save baseline for hook comparisons
uv run python -m engine.benchmark --baseline

# Run in hook mode (compares against baseline)
uv run python -m engine.benchmark --hook

# Hook mode with custom threshold (default: 20% degradation)
uv run python -m engine.benchmark --hook --threshold 0.3

# Custom baseline location
uv run python -m engine.benchmark --hook --cache /path/to/cache.json

Hook Mode

The --hook mode compares current benchmarks against a saved baseline. If performance degrades beyond the threshold (default 20%), it exits with code 1. This is useful for preventing performance regressions in feature branches.

The pre-push hook runs benchmark in hook mode to catch performance regressions before pushing.

Workflow Rules

Before Committing

Always run the test suite - never commit code that fails tests:
```
mise run test
```
Always run the linter:
```
mise run lint
```
Fix any lint errors before committing (or let the pre-commit hook handle it).
Review your changes using git diff to understand what will be committed.

On Failing Tests

When tests fail, determine whether it's an out-of-date test or a correctly failing test:

Out-of-date test: The test was written for old behavior that has legitimately changed. Update the test to match the new expected behavior.
Correctly failing test: The test correctly identifies a broken contract. Fix the implementation, not the test.

Never modify a test to make it pass without understanding why it failed.

Code Review

Before committing significant changes:

Run git diff to review all changes
Ensure new code follows existing patterns in the codebase
Check that type hints are added for new functions
Verify that tests exist for new functionality

Testing

Tests live in tests/ and follow the pattern test_*.py.

Run all tests:

mise run test

Run with coverage:

mise run test-cov

The project uses pytest with strict marker enforcement. Test configuration is in pyproject.toml under [tool.pytest.ini_options].

Test Coverage Strategy

Current coverage: 56% (336 tests)

Key areas with lower coverage (acceptable for now):

app.py (8%): Main entry point - integration heavy, requires terminal
scroll.py (10%): Terminal-dependent rendering logic
benchmark.py (0%): Standalone benchmark tool, runs separately

Key areas with good coverage:

display/backends/null.py (95%): Easy to test headlessly
display/backends/terminal.py (96%): Uses mocking
display/backends/multi.py (100%): Simple forwarding logic
effects/performance.py (99%): Pure Python logic
eventbus.py (96%): Simple event system
effects/controller.py (95%): Effects command handling

Areas needing more tests:

websocket.py (48%): Network I/O, hard to test in CI
ntfy.py (50%): Network I/O, hard to test in CI
mic.py (61%): Audio I/O, hard to test in CI

Note: Terminal-dependent modules (scroll, layers render) are harder to test in CI. Performance regression tests are in tests/test_benchmark.py with @pytest.mark.benchmark.

Architecture Notes

ntfy.py and mic.py are standalone modules with zero internal dependencies
eventbus.py provides thread-safe event publishing for decoupled communication
controller.py coordinates ntfy/mic monitoring and event publishing
effects/ - plugin architecture with performance monitoring
The render pipeline: fetch → render → effects → scroll → terminal output

Display System

Display abstraction (engine/display/): swap display backends via the Display protocol
- display/backends/terminal.py - ANSI terminal output
- display/backends/websocket.py - broadcasts to web clients via WebSocket
- display/backends/sixel.py - renders to Sixel graphics (pure Python, no C dependency)
- display/backends/null.py - headless display for testing
- display/backends/multi.py - forwards to multiple displays simultaneously
- display/__init__.py - DisplayRegistry for backend discovery
WebSocket display (engine/display/backends/websocket.py): real-time frame broadcasting to web browsers
- WebSocket server on port 8765
- HTTP server on port 8766 (serves HTML client)
- Client at client/index.html with ANSI color parsing and fullscreen support
Display modes (--display flag):
- terminal - Default ANSI terminal output
- websocket - Web browser display (requires websockets package)
- sixel - Sixel graphics in supported terminals (iTerm2, mintty, etc.)
- both - Terminal + WebSocket simultaneously

Effect Plugin System

EffectPlugin ABC (engine/effects/types.py): abstract base class for effects
- All effects must inherit from EffectPlugin and implement process() and configure()
- Runtime discovery via effects_plugins/__init__.py using issubclass() checks
EffectRegistry (engine/effects/registry.py): manages registered effects
EffectChain (engine/effects/chain.py): chains effects in pipeline order

Command & Control

C&C uses separate ntfy topics for commands and responses
NTFY_CC_CMD_TOPIC - commands from cmdline.py
NTFY_CC_RESP_TOPIC - responses back to cmdline.py
Effects controller handles /effects commands (list, on/off, intensity, reorder, stats)

7.7 KiB Raw Blame History