Files
Mainline/AGENTS.md
David Gwilliam ab3e1766b1 feat(benchmark): add hook mode with baseline cache for pre-push checks
- Fix lint errors and LSP issues in benchmark.py
- Add --hook mode to compare against saved baseline
- Add --baseline flag to save results as baseline
- Add --threshold to configure degradation threshold (default 20%)
- Add benchmark step to pre-push hook in hk.pkl
- Update AGENTS.md with hk documentation links and benchmark runner docs
2026-03-15 22:41:13 -07:00

6.0 KiB

Agent Development Guide

Development Environment

This project uses:

  • mise (mise.jdx.dev) - tool version manager and task runner
  • hk (hk.jdx.dev) - git hook manager
  • uv - fast Python package installer
  • ruff - linter and formatter
  • pytest - test runner

Setup

# Install dependencies
mise run install

# Or equivalently:
uv sync --all-extras   # includes mic support

Available Commands

mise run test           # Run tests
mise run test-v         # Run tests verbose
mise run test-cov       # Run tests with coverage report
mise run test-browser   # Run e2e browser tests (requires playwright)
mise run lint           # Run ruff linter
mise run lint-fix       # Run ruff with auto-fix
mise run format         # Run ruff formatter
mise run ci             # Full CI pipeline (topics-init + lint + test-cov)

Runtime Commands

mise run run            # Run mainline (terminal)
mise run run-poetry    # Run with poetry feed
mise run run-firehose  # Run in firehose mode
mise run run-websocket # Run with WebSocket display only
mise run run-sixel     # Run with Sixel graphics display
mise run run-both      # Run with both terminal and WebSocket
mise run run-client    # Run both + open browser
mise run cmd           # Run C&C command interface

Git Hooks

At the start of every agent session, verify hooks are installed:

ls -la .git/hooks/pre-commit

If hooks are not installed, install them with:

hk init --mise
mise run pre-commit

IMPORTANT: Always review the hk documentation before modifying hk.pkl:

The project uses hk configured in hk.pkl:

  • pre-commit: runs ruff-format and ruff (with auto-fix)
  • pre-push: runs ruff check + benchmark hook

Benchmark Runner

Run performance benchmarks:

mise run benchmark           # Run all benchmarks (text output)
mise run benchmark-json     # Run benchmarks (JSON output)
mise run benchmark-report   # Run benchmarks (Markdown report)

Benchmark Commands

# Run benchmarks
uv run python -m engine.benchmark

# Run with specific displays/effects
uv run python -m engine.benchmark --displays null,terminal --effects fade,glitch

# Save baseline for hook comparisons
uv run python -m engine.benchmark --baseline

# Run in hook mode (compares against baseline)
uv run python -m engine.benchmark --hook

# Hook mode with custom threshold (default: 20% degradation)
uv run python -m engine.benchmark --hook --threshold 0.3

# Custom baseline location
uv run python -m engine.benchmark --hook --cache /path/to/cache.json

Hook Mode

The --hook mode compares current benchmarks against a saved baseline. If performance degrades beyond the threshold (default 20%), it exits with code 1. This is useful for preventing performance regressions in feature branches.

The pre-push hook runs benchmark in hook mode to catch performance regressions before pushing.

Workflow Rules

Before Committing

  1. Always run the test suite - never commit code that fails tests:

    mise run test
    
  2. Always run the linter:

    mise run lint
    
  3. Fix any lint errors before committing (or let the pre-commit hook handle it).

  4. Review your changes using git diff to understand what will be committed.

On Failing Tests

When tests fail, determine whether it's an out-of-date test or a correctly failing test:

  • Out-of-date test: The test was written for old behavior that has legitimately changed. Update the test to match the new expected behavior.

  • Correctly failing test: The test correctly identifies a broken contract. Fix the implementation, not the test.

Never modify a test to make it pass without understanding why it failed.

Code Review

Before committing significant changes:

  • Run git diff to review all changes
  • Ensure new code follows existing patterns in the codebase
  • Check that type hints are added for new functions
  • Verify that tests exist for new functionality

Testing

Tests live in tests/ and follow the pattern test_*.py.

Run all tests:

mise run test

Run with coverage:

mise run test-cov

The project uses pytest with strict marker enforcement. Test configuration is in pyproject.toml under [tool.pytest.ini_options].

Architecture Notes

  • ntfy.py and mic.py are standalone modules with zero internal dependencies
  • eventbus.py provides thread-safe event publishing for decoupled communication
  • controller.py coordinates ntfy/mic monitoring and event publishing
  • effects/ - plugin architecture with performance monitoring
  • The render pipeline: fetch → render → effects → scroll → terminal output

Display System

  • Display abstraction (engine/display.py): swap display backends via the Display protocol

    • TerminalDisplay - ANSI terminal output
    • WebSocketDisplay - broadcasts to web clients via WebSocket
    • SixelDisplay - renders to Sixel graphics (pure Python, no C dependency)
    • MultiDisplay - forwards to multiple displays simultaneously
  • WebSocket display (engine/websocket_display.py): real-time frame broadcasting to web browsers

    • WebSocket server on port 8765
    • HTTP server on port 8766 (serves HTML client)
    • Client at client/index.html with ANSI color parsing and fullscreen support
  • Display modes (--display flag):

    • terminal - Default ANSI terminal output
    • websocket - Web browser display (requires websockets package)
    • sixel - Sixel graphics in supported terminals (iTerm2, mintty, etc.)
    • both - Terminal + WebSocket simultaneously

Command & Control

  • C&C uses separate ntfy topics for commands and responses
  • NTFY_CC_CMD_TOPIC - commands from cmdline.py
  • NTFY_CC_RESP_TOPIC - responses back to cmdline.py
  • Effects controller handles /effects commands (list, on/off, intensity, reorder, stats)