- Fix lint errors and LSP issues in benchmark.py - Add --hook mode to compare against saved baseline - Add --baseline flag to save results as baseline - Add --threshold to configure degradation threshold (default 20%) - Add benchmark step to pre-push hook in hk.pkl - Update AGENTS.md with hk documentation links and benchmark runner docs
6.0 KiB
Agent Development Guide
Development Environment
This project uses:
- mise (mise.jdx.dev) - tool version manager and task runner
- hk (hk.jdx.dev) - git hook manager
- uv - fast Python package installer
- ruff - linter and formatter
- pytest - test runner
Setup
# Install dependencies
mise run install
# Or equivalently:
uv sync --all-extras # includes mic, websocket, sixel support
Available Commands
mise run test # Run tests
mise run test-v # Run tests verbose
mise run test-cov # Run tests with coverage report
mise run test-browser # Run e2e browser tests (requires playwright)
mise run lint # Run ruff linter
mise run lint-fix # Run ruff with auto-fix
mise run format # Run ruff formatter
mise run ci # Full CI pipeline (topics-init + lint + test-cov)
Runtime Commands
mise run run # Run mainline (terminal)
mise run run-poetry # Run with poetry feed
mise run run-firehose # Run in firehose mode
mise run run-websocket # Run with WebSocket display only
mise run run-sixel # Run with Sixel graphics display
mise run run-both # Run with both terminal and WebSocket
mise run run-client # Run both + open browser
mise run cmd # Run C&C command interface
Git Hooks
At the start of every agent session, verify hooks are installed:
ls -la .git/hooks/pre-commit
If hooks are not installed, install them with:
hk init --mise
mise run pre-commit
IMPORTANT: Always review the hk documentation before modifying hk.pkl:
The project uses hk configured in hk.pkl:
- pre-commit: runs ruff-format and ruff (with auto-fix)
- pre-push: runs ruff check + benchmark hook
Benchmark Runner
Run performance benchmarks:
mise run benchmark # Run all benchmarks (text output)
mise run benchmark-json # Run benchmarks (JSON output)
mise run benchmark-report # Run benchmarks (Markdown report)
Benchmark Commands
# Run benchmarks
uv run python -m engine.benchmark
# Run with specific displays/effects
uv run python -m engine.benchmark --displays null,terminal --effects fade,glitch
# Save baseline for hook comparisons
uv run python -m engine.benchmark --baseline
# Run in hook mode (compares against baseline)
uv run python -m engine.benchmark --hook
# Hook mode with custom threshold (default: 20% degradation)
uv run python -m engine.benchmark --hook --threshold 0.3
# Custom baseline location
uv run python -m engine.benchmark --hook --cache /path/to/cache.json
Hook Mode
The --hook mode compares current benchmarks against a saved baseline. If performance degrades beyond the threshold (default 20%), it exits with code 1. This is useful for preventing performance regressions in feature branches.
The pre-push hook runs benchmark in hook mode to catch performance regressions before pushing.
Workflow Rules
Before Committing
-
Always run the test suite - never commit code that fails tests:
mise run test -
Always run the linter:
mise run lint -
Fix any lint errors before committing (or let the pre-commit hook handle it).
-
Review your changes using
git diffto understand what will be committed.
On Failing Tests
When tests fail, determine whether it's an out-of-date test or a correctly failing test:
-
Out-of-date test: The test was written for old behavior that has legitimately changed. Update the test to match the new expected behavior.
-
Correctly failing test: The test correctly identifies a broken contract. Fix the implementation, not the test.
Never modify a test to make it pass without understanding why it failed.
Code Review
Before committing significant changes:
- Run
git diffto review all changes - Ensure new code follows existing patterns in the codebase
- Check that type hints are added for new functions
- Verify that tests exist for new functionality
Testing
Tests live in tests/ and follow the pattern test_*.py.
Run all tests:
mise run test
Run with coverage:
mise run test-cov
The project uses pytest with strict marker enforcement. Test configuration is in pyproject.toml under [tool.pytest.ini_options].
Architecture Notes
- ntfy.py and mic.py are standalone modules with zero internal dependencies
- eventbus.py provides thread-safe event publishing for decoupled communication
- controller.py coordinates ntfy/mic monitoring and event publishing
- effects/ - plugin architecture with performance monitoring
- The render pipeline: fetch → render → effects → scroll → terminal output
Display System
-
Display abstraction (
engine/display.py): swap display backends via the Display protocolTerminalDisplay- ANSI terminal outputWebSocketDisplay- broadcasts to web clients via WebSocketSixelDisplay- renders to Sixel graphics (pure Python, no C dependency)MultiDisplay- forwards to multiple displays simultaneously
-
WebSocket display (
engine/websocket_display.py): real-time frame broadcasting to web browsers- WebSocket server on port 8765
- HTTP server on port 8766 (serves HTML client)
- Client at
client/index.htmlwith ANSI color parsing and fullscreen support
-
Display modes (
--displayflag):terminal- Default ANSI terminal outputwebsocket- Web browser display (requires websockets package)sixel- Sixel graphics in supported terminals (iTerm2, mintty, etc.)both- Terminal + WebSocket simultaneously
Command & Control
- C&C uses separate ntfy topics for commands and responses
NTFY_CC_CMD_TOPIC- commands from cmdline.pyNTFY_CC_RESP_TOPIC- responses back to cmdline.py- Effects controller handles
/effectscommands (list, on/off, intensity, reorder, stats)