Files

David Gwilliam ab3e1766b1 feat(benchmark): add hook mode with baseline cache for pre-push checks

- Fix lint errors and LSP issues in benchmark.py
- Add --hook mode to compare against saved baseline
- Add --baseline flag to save results as baseline
- Add --threshold to configure degradation threshold (default 20%)
- Add benchmark step to pre-push hook in hk.pkl
- Update AGENTS.md with hk documentation links and benchmark runner docs

2026-03-15 22:41:13 -07:00

6.0 KiB

Raw Blame History

Agent Development Guide

Development Environment

This project uses:

mise (mise.jdx.dev) - tool version manager and task runner
hk (hk.jdx.dev) - git hook manager
uv - fast Python package installer
ruff - linter and formatter
pytest - test runner

Setup

# Install dependencies
mise run install

# Or equivalently:
uv sync --all-extras   # includes mic support

Available Commands

mise run test           # Run tests
mise run test-v         # Run tests verbose
mise run test-cov       # Run tests with coverage report
mise run test-browser   # Run e2e browser tests (requires playwright)
mise run lint           # Run ruff linter
mise run lint-fix       # Run ruff with auto-fix
mise run format         # Run ruff formatter
mise run ci             # Full CI pipeline (topics-init + lint + test-cov)

Runtime Commands

mise run run            # Run mainline (terminal)
mise run run-poetry    # Run with poetry feed
mise run run-firehose  # Run in firehose mode
mise run run-websocket # Run with WebSocket display only
mise run run-sixel     # Run with Sixel graphics display
mise run run-both      # Run with both terminal and WebSocket
mise run run-client    # Run both + open browser
mise run cmd           # Run C&C command interface

Git Hooks

At the start of every agent session, verify hooks are installed:

ls -la .git/hooks/pre-commit

If hooks are not installed, install them with:

hk init --mise
mise run pre-commit

IMPORTANT: Always review the hk documentation before modifying hk.pkl:

The project uses hk configured in hk.pkl:

pre-commit: runs ruff-format and ruff (with auto-fix)
pre-push: runs ruff check + benchmark hook

Benchmark Runner

Run performance benchmarks:

mise run benchmark           # Run all benchmarks (text output)
mise run benchmark-json     # Run benchmarks (JSON output)
mise run benchmark-report   # Run benchmarks (Markdown report)

Benchmark Commands

# Run benchmarks
uv run python -m engine.benchmark

# Run with specific displays/effects
uv run python -m engine.benchmark --displays null,terminal --effects fade,glitch

# Save baseline for hook comparisons
uv run python -m engine.benchmark --baseline

# Run in hook mode (compares against baseline)
uv run python -m engine.benchmark --hook

# Hook mode with custom threshold (default: 20% degradation)
uv run python -m engine.benchmark --hook --threshold 0.3

# Custom baseline location
uv run python -m engine.benchmark --hook --cache /path/to/cache.json

Hook Mode

The --hook mode compares current benchmarks against a saved baseline. If performance degrades beyond the threshold (default 20%), it exits with code 1. This is useful for preventing performance regressions in feature branches.

The pre-push hook runs benchmark in hook mode to catch performance regressions before pushing.

Workflow Rules

Before Committing

Always run the test suite - never commit code that fails tests:
```
mise run test
```
Always run the linter:
```
mise run lint
```
Fix any lint errors before committing (or let the pre-commit hook handle it).
Review your changes using git diff to understand what will be committed.

On Failing Tests

When tests fail, determine whether it's an out-of-date test or a correctly failing test:

Out-of-date test: The test was written for old behavior that has legitimately changed. Update the test to match the new expected behavior.
Correctly failing test: The test correctly identifies a broken contract. Fix the implementation, not the test.

Never modify a test to make it pass without understanding why it failed.

Code Review

Before committing significant changes:

Run git diff to review all changes
Ensure new code follows existing patterns in the codebase
Check that type hints are added for new functions
Verify that tests exist for new functionality

Testing

Tests live in tests/ and follow the pattern test_*.py.

Run all tests:

mise run test

Run with coverage:

mise run test-cov

The project uses pytest with strict marker enforcement. Test configuration is in pyproject.toml under [tool.pytest.ini_options].

Architecture Notes

ntfy.py and mic.py are standalone modules with zero internal dependencies
eventbus.py provides thread-safe event publishing for decoupled communication
controller.py coordinates ntfy/mic monitoring and event publishing
effects/ - plugin architecture with performance monitoring
The render pipeline: fetch → render → effects → scroll → terminal output

Display System

Display abstraction (engine/display.py): swap display backends via the Display protocol
- TerminalDisplay - ANSI terminal output
- WebSocketDisplay - broadcasts to web clients via WebSocket
- SixelDisplay - renders to Sixel graphics (pure Python, no C dependency)
- MultiDisplay - forwards to multiple displays simultaneously
WebSocket display (engine/websocket_display.py): real-time frame broadcasting to web browsers
- WebSocket server on port 8765
- HTTP server on port 8766 (serves HTML client)
- Client at client/index.html with ANSI color parsing and fullscreen support
Display modes (--display flag):
- terminal - Default ANSI terminal output
- websocket - Web browser display (requires websockets package)
- sixel - Sixel graphics in supported terminals (iTerm2, mintty, etc.)
- both - Terminal + WebSocket simultaneously

Command & Control

C&C uses separate ntfy topics for commands and responses
NTFY_CC_CMD_TOPIC - commands from cmdline.py
NTFY_CC_RESP_TOPIC - responses back to cmdline.py
Effects controller handles /effects commands (list, on/off, intensity, reorder, stats)

6.0 KiB Raw Blame History