Mainline/AGENTS.md

# Agent Development Guide

## Development Environment

This project uses:
- **mise** (mise.jdx.dev) - tool version manager and task runner
- **hk** (hk.jdx.dev) - git hook manager
- **uv** - fast Python package installer
- **ruff** - linter and formatter
- **pytest** - test runner

### Setup

```bash
# Install dependencies
mise run install

# Or equivalently:
uv sync --all-extras   # includes mic, websocket, sixel support
```

### Available Commands

```bash
mise run test           # Run tests
mise run test-v         # Run tests verbose
mise run test-cov       # Run tests with coverage report
mise run test-browser   # Run e2e browser tests (requires playwright)
mise run lint           # Run ruff linter
mise run lint-fix       # Run ruff with auto-fix
mise run format         # Run ruff formatter
mise run ci             # Full CI pipeline (topics-init + lint + test-cov)
```

### Runtime Commands

```bash
mise run run            # Run mainline (terminal)
mise run run-poetry    # Run with poetry feed
mise run run-firehose  # Run in firehose mode
mise run run-websocket # Run with WebSocket display only
mise run run-sixel     # Run with Sixel graphics display
mise run run-both      # Run with both terminal and WebSocket
mise run run-client    # Run both + open browser
mise run cmd           # Run C&C command interface
```

## Git Hooks

**At the start of every agent session**, verify hooks are installed:

```bash
ls -la .git/hooks/pre-commit
```

If hooks are not installed, install them with:

```bash
hk init --mise
mise run pre-commit
```

**IMPORTANT**: Always review the hk documentation before modifying `hk.pkl`:
- [hk Configuration Guide](https://hk.jdx.dev/configuration.html)
- [hk Hooks Reference](https://hk.jdx.dev/hooks.html)
- [hk Builtins](https://hk.jdx.dev/builtins.html)

The project uses hk configured in `hk.pkl`:
- **pre-commit**: runs ruff-format and ruff (with auto-fix)
- **pre-push**: runs ruff check + benchmark hook

## Benchmark Runner

Run performance benchmarks:

```bash
mise run benchmark           # Run all benchmarks (text output)
mise run benchmark-json     # Run benchmarks (JSON output)
mise run benchmark-report   # Run benchmarks (Markdown report)
```

### Benchmark Commands

```bash
# Run benchmarks
uv run python -m engine.benchmark

# Run with specific displays/effects
uv run python -m engine.benchmark --displays null,terminal --effects fade,glitch

# Save baseline for hook comparisons
uv run python -m engine.benchmark --baseline

# Run in hook mode (compares against baseline)
uv run python -m engine.benchmark --hook

# Hook mode with custom threshold (default: 20% degradation)
uv run python -m engine.benchmark --hook --threshold 0.3

# Custom baseline location
uv run python -m engine.benchmark --hook --cache /path/to/cache.json
```

### Hook Mode

The `--hook` mode compares current benchmarks against a saved baseline. If performance degrades beyond the threshold (default 20%), it exits with code 1. This is useful for preventing performance regressions in feature branches.

The pre-push hook runs benchmark in hook mode to catch performance regressions before pushing.

## Workflow Rules

### Before Committing

1. **Always run the test suite** - never commit code that fails tests:
   ```bash
   mise run test
   ```

2. **Always run the linter**:
   ```bash
   mise run lint
   ```

3. **Fix any lint errors** before committing (or let the pre-commit hook handle it).

4. **Review your changes** using `git diff` to understand what will be committed.

### On Failing Tests

When tests fail, **determine whether it's an out-of-date test or a correctly failing test**:

- **Out-of-date test**: The test was written for old behavior that has legitimately changed. Update the test to match the new expected behavior.

- **Correctly failing test**: The test correctly identifies a broken contract. Fix the implementation, not the test.

**Never** modify a test to make it pass without understanding why it failed.

### Code Review

Before committing significant changes:
- Run `git diff` to review all changes
- Ensure new code follows existing patterns in the codebase
- Check that type hints are added for new functions
- Verify that tests exist for new functionality

## Testing

Tests live in `tests/` and follow the pattern `test_*.py`.

Run all tests:
```bash
mise run test
```

Run with coverage:
```bash
mise run test-cov
```

The project uses pytest with strict marker enforcement. Test configuration is in `pyproject.toml` under `[tool.pytest.ini_options]`.

### Test Coverage Strategy

Current coverage: 56% (336 tests)

Key areas with lower coverage (acceptable for now):
- **app.py** (8%): Main entry point - integration heavy, requires terminal
- **scroll.py** (10%): Terminal-dependent rendering logic
- **benchmark.py** (0%): Standalone benchmark tool, runs separately

Key areas with good coverage:
- **display/backends/null.py** (95%): Easy to test headlessly
- **display/backends/terminal.py** (96%): Uses mocking
- **display/backends/multi.py** (100%): Simple forwarding logic
- **effects/performance.py** (99%): Pure Python logic
- **eventbus.py** (96%): Simple event system
- **effects/controller.py** (95%): Effects command handling

Areas needing more tests:
- **websocket.py** (48%): Network I/O, hard to test in CI
- **ntfy.py** (50%): Network I/O, hard to test in CI
- **mic.py** (61%): Audio I/O, hard to test in CI

Note: Terminal-dependent modules (scroll, layers render) are harder to test in CI.
Performance regression tests are in `tests/test_benchmark.py` with `@pytest.mark.benchmark`.

## Architecture Notes

- **ntfy.py** and **mic.py** are standalone modules with zero internal dependencies
- **eventbus.py** provides thread-safe event publishing for decoupled communication
- **controller.py** coordinates ntfy/mic monitoring and event publishing
- **effects/** - plugin architecture with performance monitoring
- The render pipeline: fetch → render → effects → scroll → terminal output

### Display System

- **Display abstraction** (`engine/display/`): swap display backends via the Display protocol
  - `display/backends/terminal.py` - ANSI terminal output
  - `display/backends/websocket.py` - broadcasts to web clients via WebSocket
  - `display/backends/sixel.py` - renders to Sixel graphics (pure Python, no C dependency)
  - `display/backends/null.py` - headless display for testing
  - `display/backends/multi.py` - forwards to multiple displays simultaneously
  - `display/__init__.py` - DisplayRegistry for backend discovery

- **WebSocket display** (`engine/display/backends/websocket.py`): real-time frame broadcasting to web browsers
  - WebSocket server on port 8765
  - HTTP server on port 8766 (serves HTML client)
  - Client at `client/index.html` with ANSI color parsing and fullscreen support

- **Display modes** (`--display` flag):
  - `terminal` - Default ANSI terminal output
  - `websocket` - Web browser display (requires websockets package)
  - `sixel` - Sixel graphics in supported terminals (iTerm2, mintty, etc.)
  - `both` - Terminal + WebSocket simultaneously

### Effect Plugin System

- **EffectPlugin ABC** (`engine/effects/types.py`): abstract base class for effects
  - All effects must inherit from EffectPlugin and implement `process()` and `configure()`
  - Runtime discovery via `effects_plugins/__init__.py` using `issubclass()` checks

- **EffectRegistry** (`engine/effects/registry.py`): manages registered effects
- **EffectChain** (`engine/effects/chain.py`): chains effects in pipeline order

### Command & Control

- C&C uses separate ntfy topics for commands and responses
- `NTFY_CC_CMD_TOPIC` - commands from cmdline.py
- `NTFY_CC_RESP_TOPIC` - responses back to cmdline.py
- Effects controller handles `/effects` commands (list, on/off, intensity, reorder, stats)