- Fix lint errors and LSP issues in benchmark.py - Add --hook mode to compare against saved baseline - Add --baseline flag to save results as baseline - Add --threshold to configure degradation threshold (default 20%) - Add benchmark step to pre-push hook in hk.pkl - Update AGENTS.md with hk documentation links and benchmark runner docs
194 lines
6.0 KiB
Markdown
194 lines
6.0 KiB
Markdown
# Agent Development Guide
|
|
|
|
## Development Environment
|
|
|
|
This project uses:
|
|
- **mise** (mise.jdx.dev) - tool version manager and task runner
|
|
- **hk** (hk.jdx.dev) - git hook manager
|
|
- **uv** - fast Python package installer
|
|
- **ruff** - linter and formatter
|
|
- **pytest** - test runner
|
|
|
|
### Setup
|
|
|
|
```bash
|
|
# Install dependencies
|
|
mise run install
|
|
|
|
# Or equivalently:
|
|
uv sync --all-extras # includes mic, websocket, sixel support
|
|
```
|
|
|
|
### Available Commands
|
|
|
|
```bash
|
|
mise run test # Run tests
|
|
mise run test-v # Run tests verbose
|
|
mise run test-cov # Run tests with coverage report
|
|
mise run test-browser # Run e2e browser tests (requires playwright)
|
|
mise run lint # Run ruff linter
|
|
mise run lint-fix # Run ruff with auto-fix
|
|
mise run format # Run ruff formatter
|
|
mise run ci # Full CI pipeline (topics-init + lint + test-cov)
|
|
```
|
|
|
|
### Runtime Commands
|
|
|
|
```bash
|
|
mise run run # Run mainline (terminal)
|
|
mise run run-poetry # Run with poetry feed
|
|
mise run run-firehose # Run in firehose mode
|
|
mise run run-websocket # Run with WebSocket display only
|
|
mise run run-sixel # Run with Sixel graphics display
|
|
mise run run-both # Run with both terminal and WebSocket
|
|
mise run run-client # Run both + open browser
|
|
mise run cmd # Run C&C command interface
|
|
```
|
|
|
|
## Git Hooks
|
|
|
|
**At the start of every agent session**, verify hooks are installed:
|
|
|
|
```bash
|
|
ls -la .git/hooks/pre-commit
|
|
```
|
|
|
|
If hooks are not installed, install them with:
|
|
|
|
```bash
|
|
hk init --mise
|
|
mise run pre-commit
|
|
```
|
|
|
|
**IMPORTANT**: Always review the hk documentation before modifying `hk.pkl`:
|
|
- [hk Configuration Guide](https://hk.jdx.dev/configuration.html)
|
|
- [hk Hooks Reference](https://hk.jdx.dev/hooks.html)
|
|
- [hk Builtins](https://hk.jdx.dev/builtins.html)
|
|
|
|
The project uses hk configured in `hk.pkl`:
|
|
- **pre-commit**: runs ruff-format and ruff (with auto-fix)
|
|
- **pre-push**: runs ruff check + benchmark hook
|
|
|
|
## Benchmark Runner
|
|
|
|
Run performance benchmarks:
|
|
|
|
```bash
|
|
mise run benchmark # Run all benchmarks (text output)
|
|
mise run benchmark-json # Run benchmarks (JSON output)
|
|
mise run benchmark-report # Run benchmarks (Markdown report)
|
|
```
|
|
|
|
### Benchmark Commands
|
|
|
|
```bash
|
|
# Run benchmarks
|
|
uv run python -m engine.benchmark
|
|
|
|
# Run with specific displays/effects
|
|
uv run python -m engine.benchmark --displays null,terminal --effects fade,glitch
|
|
|
|
# Save baseline for hook comparisons
|
|
uv run python -m engine.benchmark --baseline
|
|
|
|
# Run in hook mode (compares against baseline)
|
|
uv run python -m engine.benchmark --hook
|
|
|
|
# Hook mode with custom threshold (default: 20% degradation)
|
|
uv run python -m engine.benchmark --hook --threshold 0.3
|
|
|
|
# Custom baseline location
|
|
uv run python -m engine.benchmark --hook --cache /path/to/cache.json
|
|
```
|
|
|
|
### Hook Mode
|
|
|
|
The `--hook` mode compares current benchmarks against a saved baseline. If performance degrades beyond the threshold (default 20%), it exits with code 1. This is useful for preventing performance regressions in feature branches.
|
|
|
|
The pre-push hook runs benchmark in hook mode to catch performance regressions before pushing.
|
|
|
|
## Workflow Rules
|
|
|
|
### Before Committing
|
|
|
|
1. **Always run the test suite** - never commit code that fails tests:
|
|
```bash
|
|
mise run test
|
|
```
|
|
|
|
2. **Always run the linter**:
|
|
```bash
|
|
mise run lint
|
|
```
|
|
|
|
3. **Fix any lint errors** before committing (or let the pre-commit hook handle it).
|
|
|
|
4. **Review your changes** using `git diff` to understand what will be committed.
|
|
|
|
### On Failing Tests
|
|
|
|
When tests fail, **determine whether it's an out-of-date test or a correctly failing test**:
|
|
|
|
- **Out-of-date test**: The test was written for old behavior that has legitimately changed. Update the test to match the new expected behavior.
|
|
|
|
- **Correctly failing test**: The test correctly identifies a broken contract. Fix the implementation, not the test.
|
|
|
|
**Never** modify a test to make it pass without understanding why it failed.
|
|
|
|
### Code Review
|
|
|
|
Before committing significant changes:
|
|
- Run `git diff` to review all changes
|
|
- Ensure new code follows existing patterns in the codebase
|
|
- Check that type hints are added for new functions
|
|
- Verify that tests exist for new functionality
|
|
|
|
## Testing
|
|
|
|
Tests live in `tests/` and follow the pattern `test_*.py`.
|
|
|
|
Run all tests:
|
|
```bash
|
|
mise run test
|
|
```
|
|
|
|
Run with coverage:
|
|
```bash
|
|
mise run test-cov
|
|
```
|
|
|
|
The project uses pytest with strict marker enforcement. Test configuration is in `pyproject.toml` under `[tool.pytest.ini_options]`.
|
|
|
|
## Architecture Notes
|
|
|
|
- **ntfy.py** and **mic.py** are standalone modules with zero internal dependencies
|
|
- **eventbus.py** provides thread-safe event publishing for decoupled communication
|
|
- **controller.py** coordinates ntfy/mic monitoring and event publishing
|
|
- **effects/** - plugin architecture with performance monitoring
|
|
- The render pipeline: fetch → render → effects → scroll → terminal output
|
|
|
|
### Display System
|
|
|
|
- **Display abstraction** (`engine/display.py`): swap display backends via the Display protocol
|
|
- `TerminalDisplay` - ANSI terminal output
|
|
- `WebSocketDisplay` - broadcasts to web clients via WebSocket
|
|
- `SixelDisplay` - renders to Sixel graphics (pure Python, no C dependency)
|
|
- `MultiDisplay` - forwards to multiple displays simultaneously
|
|
|
|
- **WebSocket display** (`engine/websocket_display.py`): real-time frame broadcasting to web browsers
|
|
- WebSocket server on port 8765
|
|
- HTTP server on port 8766 (serves HTML client)
|
|
- Client at `client/index.html` with ANSI color parsing and fullscreen support
|
|
|
|
- **Display modes** (`--display` flag):
|
|
- `terminal` - Default ANSI terminal output
|
|
- `websocket` - Web browser display (requires websockets package)
|
|
- `sixel` - Sixel graphics in supported terminals (iTerm2, mintty, etc.)
|
|
- `both` - Terminal + WebSocket simultaneously
|
|
|
|
### Command & Control
|
|
|
|
- C&C uses separate ntfy topics for commands and responses
|
|
- `NTFY_CC_CMD_TOPIC` - commands from cmdline.py
|
|
- `NTFY_CC_RESP_TOPIC` - responses back to cmdline.py
|
|
- Effects controller handles `/effects` commands (list, on/off, intensity, reorder, stats) |