Files
Mainline/AGENTS.md
David Gwilliam 9e4d54a82e feat(tests): improve coverage to 56%, add benchmark regression tests
- Add EffectPlugin ABC with @abstractmethod decorators for interface enforcement
- Add runtime interface checking in discover_plugins() with issubclass()
- Add EffectContext factory with sensible defaults
- Standardize Display __init__ (remove redundant init in TerminalDisplay)
- Document effect behavior when ticker_height=0
- Evaluate legacy effects: document coexistence, no deprecation needed
- Research plugin patterns (VST, Python entry points)
- Fix pysixel dependency (removed broken dependency)

Test coverage improvements:
- Add DisplayRegistry tests
- Add MultiDisplay tests
- Add SixelDisplay tests
- Add controller._get_display tests
- Add effects controller command handling tests
- Add benchmark regression tests (@pytest.mark.benchmark)
- Add pytest marker for benchmark tests in pyproject.toml

Documentation updates:
- Update AGENTS.md with 56% coverage stats and effect plugin docs
- Update README.md with Sixel display mode and benchmark commands
- Add new modules to architecture section
2026-03-15 23:26:10 -07:00

230 lines
7.7 KiB
Markdown

# Agent Development Guide
## Development Environment
This project uses:
- **mise** (mise.jdx.dev) - tool version manager and task runner
- **hk** (hk.jdx.dev) - git hook manager
- **uv** - fast Python package installer
- **ruff** - linter and formatter
- **pytest** - test runner
### Setup
```bash
# Install dependencies
mise run install
# Or equivalently:
uv sync --all-extras # includes mic, websocket, sixel support
```
### Available Commands
```bash
mise run test # Run tests
mise run test-v # Run tests verbose
mise run test-cov # Run tests with coverage report
mise run test-browser # Run e2e browser tests (requires playwright)
mise run lint # Run ruff linter
mise run lint-fix # Run ruff with auto-fix
mise run format # Run ruff formatter
mise run ci # Full CI pipeline (topics-init + lint + test-cov)
```
### Runtime Commands
```bash
mise run run # Run mainline (terminal)
mise run run-poetry # Run with poetry feed
mise run run-firehose # Run in firehose mode
mise run run-websocket # Run with WebSocket display only
mise run run-sixel # Run with Sixel graphics display
mise run run-both # Run with both terminal and WebSocket
mise run run-client # Run both + open browser
mise run cmd # Run C&C command interface
```
## Git Hooks
**At the start of every agent session**, verify hooks are installed:
```bash
ls -la .git/hooks/pre-commit
```
If hooks are not installed, install them with:
```bash
hk init --mise
mise run pre-commit
```
**IMPORTANT**: Always review the hk documentation before modifying `hk.pkl`:
- [hk Configuration Guide](https://hk.jdx.dev/configuration.html)
- [hk Hooks Reference](https://hk.jdx.dev/hooks.html)
- [hk Builtins](https://hk.jdx.dev/builtins.html)
The project uses hk configured in `hk.pkl`:
- **pre-commit**: runs ruff-format and ruff (with auto-fix)
- **pre-push**: runs ruff check + benchmark hook
## Benchmark Runner
Run performance benchmarks:
```bash
mise run benchmark # Run all benchmarks (text output)
mise run benchmark-json # Run benchmarks (JSON output)
mise run benchmark-report # Run benchmarks (Markdown report)
```
### Benchmark Commands
```bash
# Run benchmarks
uv run python -m engine.benchmark
# Run with specific displays/effects
uv run python -m engine.benchmark --displays null,terminal --effects fade,glitch
# Save baseline for hook comparisons
uv run python -m engine.benchmark --baseline
# Run in hook mode (compares against baseline)
uv run python -m engine.benchmark --hook
# Hook mode with custom threshold (default: 20% degradation)
uv run python -m engine.benchmark --hook --threshold 0.3
# Custom baseline location
uv run python -m engine.benchmark --hook --cache /path/to/cache.json
```
### Hook Mode
The `--hook` mode compares current benchmarks against a saved baseline. If performance degrades beyond the threshold (default 20%), it exits with code 1. This is useful for preventing performance regressions in feature branches.
The pre-push hook runs benchmark in hook mode to catch performance regressions before pushing.
## Workflow Rules
### Before Committing
1. **Always run the test suite** - never commit code that fails tests:
```bash
mise run test
```
2. **Always run the linter**:
```bash
mise run lint
```
3. **Fix any lint errors** before committing (or let the pre-commit hook handle it).
4. **Review your changes** using `git diff` to understand what will be committed.
### On Failing Tests
When tests fail, **determine whether it's an out-of-date test or a correctly failing test**:
- **Out-of-date test**: The test was written for old behavior that has legitimately changed. Update the test to match the new expected behavior.
- **Correctly failing test**: The test correctly identifies a broken contract. Fix the implementation, not the test.
**Never** modify a test to make it pass without understanding why it failed.
### Code Review
Before committing significant changes:
- Run `git diff` to review all changes
- Ensure new code follows existing patterns in the codebase
- Check that type hints are added for new functions
- Verify that tests exist for new functionality
## Testing
Tests live in `tests/` and follow the pattern `test_*.py`.
Run all tests:
```bash
mise run test
```
Run with coverage:
```bash
mise run test-cov
```
The project uses pytest with strict marker enforcement. Test configuration is in `pyproject.toml` under `[tool.pytest.ini_options]`.
### Test Coverage Strategy
Current coverage: 56% (336 tests)
Key areas with lower coverage (acceptable for now):
- **app.py** (8%): Main entry point - integration heavy, requires terminal
- **scroll.py** (10%): Terminal-dependent rendering logic
- **benchmark.py** (0%): Standalone benchmark tool, runs separately
Key areas with good coverage:
- **display/backends/null.py** (95%): Easy to test headlessly
- **display/backends/terminal.py** (96%): Uses mocking
- **display/backends/multi.py** (100%): Simple forwarding logic
- **effects/performance.py** (99%): Pure Python logic
- **eventbus.py** (96%): Simple event system
- **effects/controller.py** (95%): Effects command handling
Areas needing more tests:
- **websocket.py** (48%): Network I/O, hard to test in CI
- **ntfy.py** (50%): Network I/O, hard to test in CI
- **mic.py** (61%): Audio I/O, hard to test in CI
Note: Terminal-dependent modules (scroll, layers render) are harder to test in CI.
Performance regression tests are in `tests/test_benchmark.py` with `@pytest.mark.benchmark`.
## Architecture Notes
- **ntfy.py** and **mic.py** are standalone modules with zero internal dependencies
- **eventbus.py** provides thread-safe event publishing for decoupled communication
- **controller.py** coordinates ntfy/mic monitoring and event publishing
- **effects/** - plugin architecture with performance monitoring
- The render pipeline: fetch → render → effects → scroll → terminal output
### Display System
- **Display abstraction** (`engine/display/`): swap display backends via the Display protocol
- `display/backends/terminal.py` - ANSI terminal output
- `display/backends/websocket.py` - broadcasts to web clients via WebSocket
- `display/backends/sixel.py` - renders to Sixel graphics (pure Python, no C dependency)
- `display/backends/null.py` - headless display for testing
- `display/backends/multi.py` - forwards to multiple displays simultaneously
- `display/__init__.py` - DisplayRegistry for backend discovery
- **WebSocket display** (`engine/display/backends/websocket.py`): real-time frame broadcasting to web browsers
- WebSocket server on port 8765
- HTTP server on port 8766 (serves HTML client)
- Client at `client/index.html` with ANSI color parsing and fullscreen support
- **Display modes** (`--display` flag):
- `terminal` - Default ANSI terminal output
- `websocket` - Web browser display (requires websockets package)
- `sixel` - Sixel graphics in supported terminals (iTerm2, mintty, etc.)
- `both` - Terminal + WebSocket simultaneously
### Effect Plugin System
- **EffectPlugin ABC** (`engine/effects/types.py`): abstract base class for effects
- All effects must inherit from EffectPlugin and implement `process()` and `configure()`
- Runtime discovery via `effects_plugins/__init__.py` using `issubclass()` checks
- **EffectRegistry** (`engine/effects/registry.py`): manages registered effects
- **EffectChain** (`engine/effects/chain.py`): chains effects in pipeline order
### Command & Control
- C&C uses separate ntfy topics for commands and responses
- `NTFY_CC_CMD_TOPIC` - commands from cmdline.py
- `NTFY_CC_RESP_TOPIC` - responses back to cmdline.py
- Effects controller handles `/effects` commands (list, on/off, intensity, reorder, stats)