feat(benchmark): add hook mode with baseline cache for pre-push checks

- Fix lint errors and LSP issues in benchmark.py
- Add --hook mode to compare against saved baseline
- Add --baseline flag to save results as baseline
- Add --threshold to configure degradation threshold (default 20%)
- Add benchmark step to pre-push hook in hk.pkl
- Update AGENTS.md with hk documentation links and benchmark runner docs
This commit is contained in:
2026-03-15 22:41:13 -07:00
parent 829c4ab63d
commit dcd31469a5
4 changed files with 350 additions and 76 deletions

View File

@@ -16,7 +16,7 @@ This project uses:
mise run install
# Or equivalently:
uv sync --all-extras # includes mic support
uv sync --all-extras # includes mic, websocket, sixel support
```
### Available Commands
@@ -60,9 +60,52 @@ hk init --mise
mise run pre-commit
```
**IMPORTANT**: Always review the hk documentation before modifying `hk.pkl`:
- [hk Configuration Guide](https://hk.jdx.dev/configuration.html)
- [hk Hooks Reference](https://hk.jdx.dev/hooks.html)
- [hk Builtins](https://hk.jdx.dev/builtins.html)
The project uses hk configured in `hk.pkl`:
- **pre-commit**: runs ruff-format and ruff (with auto-fix)
- **pre-push**: runs ruff check
- **pre-push**: runs ruff check + benchmark hook
## Benchmark Runner
Run performance benchmarks:
```bash
mise run benchmark # Run all benchmarks (text output)
mise run benchmark-json # Run benchmarks (JSON output)
mise run benchmark-report # Run benchmarks (Markdown report)
```
### Benchmark Commands
```bash
# Run benchmarks
uv run python -m engine.benchmark
# Run with specific displays/effects
uv run python -m engine.benchmark --displays null,terminal --effects fade,glitch
# Save baseline for hook comparisons
uv run python -m engine.benchmark --baseline
# Run in hook mode (compares against baseline)
uv run python -m engine.benchmark --hook
# Hook mode with custom threshold (default: 20% degradation)
uv run python -m engine.benchmark --hook --threshold 0.3
# Custom baseline location
uv run python -m engine.benchmark --hook --cache /path/to/cache.json
```
### Hook Mode
The `--hook` mode compares current benchmarks against a saved baseline. If performance degrades beyond the threshold (default 20%), it exits with code 1. This is useful for preventing performance regressions in feature branches.
The pre-push hook runs benchmark in hook mode to catch performance regressions before pushing.
## Workflow Rules