feat(benchmark): add hook mode with baseline cache for pre-push checks

- Fix lint errors and LSP issues in benchmark.py - Add --hook mode to compare against saved baseline - Add --baseline flag to save results as baseline - Add --threshold to configure degradation threshold (default 20%) - Add benchmark step to pre-push hook in hk.pkl - Update AGENTS.md with hk documentation links and benchmark runner docs
2026-03-15 22:41:13 -07:00
parent 829c4ab63d
commit dcd31469a5
4 changed files with 350 additions and 76 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -16,7 +16,7 @@ This project uses:
 mise run install

 # Or equivalently:
-uv sync --all-extras   # includes mic support
+uv sync --all-extras   # includes mic, websocket, sixel support
 ```

 ### Available Commands
@@ -60,9 +60,52 @@ hk init --mise
 mise run pre-commit
 ```

+**IMPORTANT**: Always review the hk documentation before modifying `hk.pkl`:
+- [hk Configuration Guide](https://hk.jdx.dev/configuration.html)
+- [hk Hooks Reference](https://hk.jdx.dev/hooks.html)
+- [hk Builtins](https://hk.jdx.dev/builtins.html)
+
 The project uses hk configured in `hk.pkl`:
 - **pre-commit**: runs ruff-format and ruff (with auto-fix)
- **pre-push**: runs ruff check
+- **pre-push**: runs ruff check + benchmark hook
+
+## Benchmark Runner
+
+Run performance benchmarks:
+
+```bash
+mise run benchmark           # Run all benchmarks (text output)
+mise run benchmark-json     # Run benchmarks (JSON output)
+mise run benchmark-report   # Run benchmarks (Markdown report)
+```
+
+### Benchmark Commands
+
+```bash
+# Run benchmarks
+uv run python -m engine.benchmark
+
+# Run with specific displays/effects
+uv run python -m engine.benchmark --displays null,terminal --effects fade,glitch
+
+# Save baseline for hook comparisons
+uv run python -m engine.benchmark --baseline
+
+# Run in hook mode (compares against baseline)
+uv run python -m engine.benchmark --hook
+
+# Hook mode with custom threshold (default: 20% degradation)
+uv run python -m engine.benchmark --hook --threshold 0.3
+
+# Custom baseline location
+uv run python -m engine.benchmark --hook --cache /path/to/cache.json
+```
+
+### Hook Mode
+
+The `--hook` mode compares current benchmarks against a saved baseline. If performance degrades beyond the threshold (default 20%), it exits with code 1. This is useful for preventing performance regressions in feature branches.
+
+The pre-push hook runs benchmark in hook mode to catch performance regressions before pushing.

 ## Workflow Rules