# Agent Development Guide ## Development Environment This project uses: - **mise** (mise.jdx.dev) - tool version manager and task runner - **hk** (hk.jdx.dev) - git hook manager - **uv** - fast Python package installer - **ruff** - linter and formatter - **pytest** - test runner ### Setup ```bash # Install dependencies mise run install # Or equivalently: uv sync --all-extras # includes mic, websocket, sixel support ``` ### Available Commands ```bash mise run test # Run tests mise run test-v # Run tests verbose mise run test-cov # Run tests with coverage report mise run test-browser # Run e2e browser tests (requires playwright) mise run lint # Run ruff linter mise run lint-fix # Run ruff with auto-fix mise run format # Run ruff formatter mise run ci # Full CI pipeline (topics-init + lint + test-cov) ``` ### Runtime Commands ```bash mise run run # Run mainline (terminal) mise run run-poetry # Run with poetry feed mise run run-firehose # Run in firehose mode mise run run-websocket # Run with WebSocket display only mise run run-sixel # Run with Sixel graphics display mise run run-both # Run with both terminal and WebSocket mise run run-client # Run both + open browser mise run cmd # Run C&C command interface ``` ## Git Hooks **At the start of every agent session**, verify hooks are installed: ```bash ls -la .git/hooks/pre-commit ``` If hooks are not installed, install them with: ```bash hk init --mise mise run pre-commit ``` **IMPORTANT**: Always review the hk documentation before modifying `hk.pkl`: - [hk Configuration Guide](https://hk.jdx.dev/configuration.html) - [hk Hooks Reference](https://hk.jdx.dev/hooks.html) - [hk Builtins](https://hk.jdx.dev/builtins.html) The project uses hk configured in `hk.pkl`: - **pre-commit**: runs ruff-format and ruff (with auto-fix) - **pre-push**: runs ruff check + benchmark hook ## Benchmark Runner Run performance benchmarks: ```bash mise run benchmark # Run all benchmarks (text output) mise run benchmark-json # Run benchmarks (JSON output) mise run benchmark-report # Run benchmarks (Markdown report) ``` ### Benchmark Commands ```bash # Run benchmarks uv run python -m engine.benchmark # Run with specific displays/effects uv run python -m engine.benchmark --displays null,terminal --effects fade,glitch # Save baseline for hook comparisons uv run python -m engine.benchmark --baseline # Run in hook mode (compares against baseline) uv run python -m engine.benchmark --hook # Hook mode with custom threshold (default: 20% degradation) uv run python -m engine.benchmark --hook --threshold 0.3 # Custom baseline location uv run python -m engine.benchmark --hook --cache /path/to/cache.json ``` ### Hook Mode The `--hook` mode compares current benchmarks against a saved baseline. If performance degrades beyond the threshold (default 20%), it exits with code 1. This is useful for preventing performance regressions in feature branches. The pre-push hook runs benchmark in hook mode to catch performance regressions before pushing. ## Workflow Rules ### Before Committing 1. **Always run the test suite** - never commit code that fails tests: ```bash mise run test ``` 2. **Always run the linter**: ```bash mise run lint ``` 3. **Fix any lint errors** before committing (or let the pre-commit hook handle it). 4. **Review your changes** using `git diff` to understand what will be committed. ### On Failing Tests When tests fail, **determine whether it's an out-of-date test or a correctly failing test**: - **Out-of-date test**: The test was written for old behavior that has legitimately changed. Update the test to match the new expected behavior. - **Correctly failing test**: The test correctly identifies a broken contract. Fix the implementation, not the test. **Never** modify a test to make it pass without understanding why it failed. ### Code Review Before committing significant changes: - Run `git diff` to review all changes - Ensure new code follows existing patterns in the codebase - Check that type hints are added for new functions - Verify that tests exist for new functionality ## Testing Tests live in `tests/` and follow the pattern `test_*.py`. Run all tests: ```bash mise run test ``` Run with coverage: ```bash mise run test-cov ``` The project uses pytest with strict marker enforcement. Test configuration is in `pyproject.toml` under `[tool.pytest.ini_options]`. ### Test Coverage Strategy Current coverage: 56% (336 tests) Key areas with lower coverage (acceptable for now): - **app.py** (8%): Main entry point - integration heavy, requires terminal - **scroll.py** (10%): Terminal-dependent rendering logic - **benchmark.py** (0%): Standalone benchmark tool, runs separately Key areas with good coverage: - **display/backends/null.py** (95%): Easy to test headlessly - **display/backends/terminal.py** (96%): Uses mocking - **display/backends/multi.py** (100%): Simple forwarding logic - **effects/performance.py** (99%): Pure Python logic - **eventbus.py** (96%): Simple event system - **effects/controller.py** (95%): Effects command handling Areas needing more tests: - **websocket.py** (48%): Network I/O, hard to test in CI - **ntfy.py** (50%): Network I/O, hard to test in CI - **mic.py** (61%): Audio I/O, hard to test in CI Note: Terminal-dependent modules (scroll, layers render) are harder to test in CI. Performance regression tests are in `tests/test_benchmark.py` with `@pytest.mark.benchmark`. ## Architecture Notes - **ntfy.py** and **mic.py** are standalone modules with zero internal dependencies - **eventbus.py** provides thread-safe event publishing for decoupled communication - **controller.py** coordinates ntfy/mic monitoring and event publishing - **effects/** - plugin architecture with performance monitoring - The render pipeline: fetch → render → effects → scroll → terminal output ### Display System - **Display abstraction** (`engine/display/`): swap display backends via the Display protocol - `display/backends/terminal.py` - ANSI terminal output - `display/backends/websocket.py` - broadcasts to web clients via WebSocket - `display/backends/sixel.py` - renders to Sixel graphics (pure Python, no C dependency) - `display/backends/null.py` - headless display for testing - `display/backends/multi.py` - forwards to multiple displays simultaneously - `display/__init__.py` - DisplayRegistry for backend discovery - **WebSocket display** (`engine/display/backends/websocket.py`): real-time frame broadcasting to web browsers - WebSocket server on port 8765 - HTTP server on port 8766 (serves HTML client) - Client at `client/index.html` with ANSI color parsing and fullscreen support - **Display modes** (`--display` flag): - `terminal` - Default ANSI terminal output - `websocket` - Web browser display (requires websockets package) - `sixel` - Sixel graphics in supported terminals (iTerm2, mintty, etc.) - `both` - Terminal + WebSocket simultaneously ### Effect Plugin System - **EffectPlugin ABC** (`engine/effects/types.py`): abstract base class for effects - All effects must inherit from EffectPlugin and implement `process()` and `configure()` - Runtime discovery via `effects_plugins/__init__.py` using `issubclass()` checks - **EffectRegistry** (`engine/effects/registry.py`): manages registered effects - **EffectChain** (`engine/effects/chain.py`): chains effects in pipeline order ### Command & Control - C&C uses separate ntfy topics for commands and responses - `NTFY_CC_CMD_TOPIC` - commands from cmdline.py - `NTFY_CC_RESP_TOPIC` - responses back to cmdline.py - Effects controller handles `/effects` commands (list, on/off, intensity, reorder, stats)