Files
sideline/AGENTS.md
David Gwilliam 9e4d54a82e feat(tests): improve coverage to 56%, add benchmark regression tests
- Add EffectPlugin ABC with @abstractmethod decorators for interface enforcement
- Add runtime interface checking in discover_plugins() with issubclass()
- Add EffectContext factory with sensible defaults
- Standardize Display __init__ (remove redundant init in TerminalDisplay)
- Document effect behavior when ticker_height=0
- Evaluate legacy effects: document coexistence, no deprecation needed
- Research plugin patterns (VST, Python entry points)
- Fix pysixel dependency (removed broken dependency)

Test coverage improvements:
- Add DisplayRegistry tests
- Add MultiDisplay tests
- Add SixelDisplay tests
- Add controller._get_display tests
- Add effects controller command handling tests
- Add benchmark regression tests (@pytest.mark.benchmark)
- Add pytest marker for benchmark tests in pyproject.toml

Documentation updates:
- Update AGENTS.md with 56% coverage stats and effect plugin docs
- Update README.md with Sixel display mode and benchmark commands
- Add new modules to architecture section
2026-03-15 23:26:10 -07:00

7.7 KiB

Agent Development Guide

Development Environment

This project uses:

  • mise (mise.jdx.dev) - tool version manager and task runner
  • hk (hk.jdx.dev) - git hook manager
  • uv - fast Python package installer
  • ruff - linter and formatter
  • pytest - test runner

Setup

# Install dependencies
mise run install

# Or equivalently:
uv sync --all-extras   # includes mic, websocket, sixel support

Available Commands

mise run test           # Run tests
mise run test-v         # Run tests verbose
mise run test-cov       # Run tests with coverage report
mise run test-browser   # Run e2e browser tests (requires playwright)
mise run lint           # Run ruff linter
mise run lint-fix       # Run ruff with auto-fix
mise run format         # Run ruff formatter
mise run ci             # Full CI pipeline (topics-init + lint + test-cov)

Runtime Commands

mise run run            # Run mainline (terminal)
mise run run-poetry    # Run with poetry feed
mise run run-firehose  # Run in firehose mode
mise run run-websocket # Run with WebSocket display only
mise run run-sixel     # Run with Sixel graphics display
mise run run-both      # Run with both terminal and WebSocket
mise run run-client    # Run both + open browser
mise run cmd           # Run C&C command interface

Git Hooks

At the start of every agent session, verify hooks are installed:

ls -la .git/hooks/pre-commit

If hooks are not installed, install them with:

hk init --mise
mise run pre-commit

IMPORTANT: Always review the hk documentation before modifying hk.pkl:

The project uses hk configured in hk.pkl:

  • pre-commit: runs ruff-format and ruff (with auto-fix)
  • pre-push: runs ruff check + benchmark hook

Benchmark Runner

Run performance benchmarks:

mise run benchmark           # Run all benchmarks (text output)
mise run benchmark-json     # Run benchmarks (JSON output)
mise run benchmark-report   # Run benchmarks (Markdown report)

Benchmark Commands

# Run benchmarks
uv run python -m engine.benchmark

# Run with specific displays/effects
uv run python -m engine.benchmark --displays null,terminal --effects fade,glitch

# Save baseline for hook comparisons
uv run python -m engine.benchmark --baseline

# Run in hook mode (compares against baseline)
uv run python -m engine.benchmark --hook

# Hook mode with custom threshold (default: 20% degradation)
uv run python -m engine.benchmark --hook --threshold 0.3

# Custom baseline location
uv run python -m engine.benchmark --hook --cache /path/to/cache.json

Hook Mode

The --hook mode compares current benchmarks against a saved baseline. If performance degrades beyond the threshold (default 20%), it exits with code 1. This is useful for preventing performance regressions in feature branches.

The pre-push hook runs benchmark in hook mode to catch performance regressions before pushing.

Workflow Rules

Before Committing

  1. Always run the test suite - never commit code that fails tests:

    mise run test
    
  2. Always run the linter:

    mise run lint
    
  3. Fix any lint errors before committing (or let the pre-commit hook handle it).

  4. Review your changes using git diff to understand what will be committed.

On Failing Tests

When tests fail, determine whether it's an out-of-date test or a correctly failing test:

  • Out-of-date test: The test was written for old behavior that has legitimately changed. Update the test to match the new expected behavior.

  • Correctly failing test: The test correctly identifies a broken contract. Fix the implementation, not the test.

Never modify a test to make it pass without understanding why it failed.

Code Review

Before committing significant changes:

  • Run git diff to review all changes
  • Ensure new code follows existing patterns in the codebase
  • Check that type hints are added for new functions
  • Verify that tests exist for new functionality

Testing

Tests live in tests/ and follow the pattern test_*.py.

Run all tests:

mise run test

Run with coverage:

mise run test-cov

The project uses pytest with strict marker enforcement. Test configuration is in pyproject.toml under [tool.pytest.ini_options].

Test Coverage Strategy

Current coverage: 56% (336 tests)

Key areas with lower coverage (acceptable for now):

  • app.py (8%): Main entry point - integration heavy, requires terminal
  • scroll.py (10%): Terminal-dependent rendering logic
  • benchmark.py (0%): Standalone benchmark tool, runs separately

Key areas with good coverage:

  • display/backends/null.py (95%): Easy to test headlessly
  • display/backends/terminal.py (96%): Uses mocking
  • display/backends/multi.py (100%): Simple forwarding logic
  • effects/performance.py (99%): Pure Python logic
  • eventbus.py (96%): Simple event system
  • effects/controller.py (95%): Effects command handling

Areas needing more tests:

  • websocket.py (48%): Network I/O, hard to test in CI
  • ntfy.py (50%): Network I/O, hard to test in CI
  • mic.py (61%): Audio I/O, hard to test in CI

Note: Terminal-dependent modules (scroll, layers render) are harder to test in CI. Performance regression tests are in tests/test_benchmark.py with @pytest.mark.benchmark.

Architecture Notes

  • ntfy.py and mic.py are standalone modules with zero internal dependencies
  • eventbus.py provides thread-safe event publishing for decoupled communication
  • controller.py coordinates ntfy/mic monitoring and event publishing
  • effects/ - plugin architecture with performance monitoring
  • The render pipeline: fetch → render → effects → scroll → terminal output

Display System

  • Display abstraction (engine/display/): swap display backends via the Display protocol

    • display/backends/terminal.py - ANSI terminal output
    • display/backends/websocket.py - broadcasts to web clients via WebSocket
    • display/backends/sixel.py - renders to Sixel graphics (pure Python, no C dependency)
    • display/backends/null.py - headless display for testing
    • display/backends/multi.py - forwards to multiple displays simultaneously
    • display/__init__.py - DisplayRegistry for backend discovery
  • WebSocket display (engine/display/backends/websocket.py): real-time frame broadcasting to web browsers

    • WebSocket server on port 8765
    • HTTP server on port 8766 (serves HTML client)
    • Client at client/index.html with ANSI color parsing and fullscreen support
  • Display modes (--display flag):

    • terminal - Default ANSI terminal output
    • websocket - Web browser display (requires websockets package)
    • sixel - Sixel graphics in supported terminals (iTerm2, mintty, etc.)
    • both - Terminal + WebSocket simultaneously

Effect Plugin System

  • EffectPlugin ABC (engine/effects/types.py): abstract base class for effects

    • All effects must inherit from EffectPlugin and implement process() and configure()
    • Runtime discovery via effects_plugins/__init__.py using issubclass() checks
  • EffectRegistry (engine/effects/registry.py): manages registered effects

  • EffectChain (engine/effects/chain.py): chains effects in pipeline order

Command & Control

  • C&C uses separate ntfy topics for commands and responses
  • NTFY_CC_CMD_TOPIC - commands from cmdline.py
  • NTFY_CC_RESP_TOPIC - responses back to cmdline.py
  • Effects controller handles /effects commands (list, on/off, intensity, reorder, stats)