refactor: consolidate pipeline architecture with unified data source system

MAJOR REFACTORING: Consolidate duplicated pipeline code and standardize on
capability-based dependency resolution. This is a significant but backwards-compatible
restructuring that improves maintainability and extensibility.

## ARCHITECTURE CHANGES

### Data Sources Consolidation
- Move engine/sources_v2.py → engine/data_sources/sources.py
- Move engine/pipeline_sources/ → engine/data_sources/
- Create unified DataSource ABC with common interface:
  * fetch() - idempotent data retrieval
  * get_items() - cached access with automatic refresh
  * refresh() - force cache invalidation
  * is_dynamic - indicate streaming vs static sources
- Support for SourceItem dataclass (content, source, timestamp, metadata)

### Display Backend Improvements
- Update all 7 display backends to use new import paths
- Terminal: Improve dimension detection and handling
- WebSocket: Better error handling and client lifecycle
- Sixel: Refactor graphics rendering
- Pygame: Modernize event handling
- Kitty: Add protocol support for inline images
- Multi: Ensure proper forwarding to all backends
- Null: Maintain testing backend functionality

### Pipeline Adapter Consolidation
- Refactor adapter stages for clarity and flexibility
- RenderStage now handles both item-based and buffer-based rendering
- Add SourceItemsToBufferStage for converting data source items
- Improve DataSourceStage to work with all source types
- Add DisplayStage wrapper for display backends

### Camera & Viewport Refinements
- Update Camera class for new architecture
- Improve viewport dimension detection
- Better handling of resize events across backends

### New Effect Plugins
- border.py: Frame rendering effect with configurable style
- crop.py: Viewport clipping effect for selective display
- tint.py: Color filtering effect for atmosphere

### Tests & Quality
- Add test_border_effect.py with comprehensive border tests
- Add test_crop_effect.py with viewport clipping tests
- Add test_tint_effect.py with color filtering tests
- Update test_pipeline.py for new architecture
- Update test_pipeline_introspection.py for new data source location
- All 463 tests pass with 56% coverage
- Linting: All checks pass with ruff

### Removals (Code Cleanup)
- Delete engine/benchmark.py (deprecated performance testing)
- Delete engine/pipeline_sources/__init__.py (moved to data_sources)
- Delete engine/sources_v2.py (replaced by data_sources/sources.py)
- Update AGENTS.md to reflect new structure

### Import Path Updates
- Update engine/pipeline/controller.py::create_default_pipeline()
  * Old: from engine.sources_v2 import HeadlinesDataSource
  * New: from engine.data_sources.sources import HeadlinesDataSource
- All display backends import from new locations
- All tests import from new locations

## BACKWARDS COMPATIBILITY

This refactoring is intended to be backwards compatible:
- Pipeline execution unchanged (DAG-based with capability matching)
- Effect plugins unchanged (EffectPlugin interface same)
- Display protocol unchanged (Display duck-typing works as before)
- Config system unchanged (presets.toml format same)

## TESTING

- 463 tests pass (0 failures, 19 skipped)
- Full linting check passes
- Manual testing on demo, poetry, websocket modes
- All new effect plugins tested

## FILES CHANGED

- 24 files modified/added/deleted
- 723 insertions, 1,461 deletions (net -738 LOC - cleanup!)
- No breaking changes to public APIs
- All transitive imports updated correctly
This commit is contained in:
2026-03-16 19:47:12 -07:00
parent 3a3d0c0607
commit e0bbfea26c
30 changed files with 1435 additions and 884 deletions

View File

@@ -1,730 +0,0 @@
#!/usr/bin/env python3
"""
Benchmark runner for mainline - tests performance across effects and displays.
Usage:
python -m engine.benchmark
python -m engine.benchmark --output report.md
python -m engine.benchmark --displays terminal,websocket --effects glitch,fade
python -m engine.benchmark --format json --output benchmark.json
Headless mode (default): suppress all terminal output during benchmarks.
"""
import argparse
import json
import sys
import time
from dataclasses import dataclass, field
from datetime import datetime
from io import StringIO
from pathlib import Path
from typing import Any
import numpy as np
@dataclass
class BenchmarkResult:
"""Result of a single benchmark run."""
name: str
display: str
effect: str | None
iterations: int
total_time_ms: float
avg_time_ms: float
std_dev_ms: float
min_ms: float
max_ms: float
fps: float
chars_processed: int
chars_per_sec: float
@dataclass
class BenchmarkReport:
"""Complete benchmark report."""
timestamp: str
python_version: str
results: list[BenchmarkResult] = field(default_factory=list)
summary: dict[str, Any] = field(default_factory=dict)
def get_sample_buffer(width: int = 80, height: int = 24) -> list[str]:
"""Generate a sample buffer for benchmarking."""
lines = []
for i in range(height):
line = f"\x1b[32mLine {i}\x1b[0m " + "A" * (width - 10)
lines.append(line)
return lines
def benchmark_display(
display_class,
buffer: list[str],
iterations: int = 100,
display=None,
reuse: bool = False,
) -> BenchmarkResult | None:
"""Benchmark a single display.
Args:
display_class: Display class to instantiate
buffer: Buffer to display
iterations: Number of iterations
display: Optional existing display instance to reuse
reuse: If True and display provided, use reuse mode
"""
old_stdout = sys.stdout
old_stderr = sys.stderr
try:
sys.stdout = StringIO()
sys.stderr = StringIO()
if display is None:
display = display_class()
display.init(80, 24, reuse=False)
should_cleanup = True
else:
should_cleanup = False
times = []
chars = sum(len(line) for line in buffer)
for _ in range(iterations):
t0 = time.perf_counter()
display.show(buffer)
elapsed = (time.perf_counter() - t0) * 1000
times.append(elapsed)
if should_cleanup and hasattr(display, "cleanup"):
display.cleanup(quit_pygame=False)
except Exception:
return None
finally:
sys.stdout = old_stdout
sys.stderr = old_stderr
times_arr = np.array(times)
return BenchmarkResult(
name=f"display_{display_class.__name__}",
display=display_class.__name__,
effect=None,
iterations=iterations,
total_time_ms=sum(times),
avg_time_ms=float(np.mean(times_arr)),
std_dev_ms=float(np.std(times_arr)),
min_ms=float(np.min(times_arr)),
max_ms=float(np.max(times_arr)),
fps=float(1000.0 / np.mean(times_arr)) if np.mean(times_arr) > 0 else 0.0,
chars_processed=chars * iterations,
chars_per_sec=float((chars * iterations) / (sum(times) / 1000))
if sum(times) > 0
else 0.0,
)
def benchmark_effect_with_display(
effect_class, display, buffer: list[str], iterations: int = 100, reuse: bool = False
) -> BenchmarkResult | None:
"""Benchmark an effect with a display.
Args:
effect_class: Effect class to instantiate
display: Display instance to use
buffer: Buffer to process and display
iterations: Number of iterations
reuse: If True, use reuse mode for display
"""
old_stdout = sys.stdout
old_stderr = sys.stderr
try:
from engine.effects.types import EffectConfig, EffectContext
sys.stdout = StringIO()
sys.stderr = StringIO()
effect = effect_class()
effect.configure(EffectConfig(enabled=True, intensity=1.0))
ctx = EffectContext(
terminal_width=80,
terminal_height=24,
scroll_cam=0,
ticker_height=0,
mic_excess=0.0,
grad_offset=0.0,
frame_number=0,
has_message=False,
)
times = []
chars = sum(len(line) for line in buffer)
for _ in range(iterations):
processed = effect.process(buffer, ctx)
t0 = time.perf_counter()
display.show(processed)
elapsed = (time.perf_counter() - t0) * 1000
times.append(elapsed)
if not reuse and hasattr(display, "cleanup"):
display.cleanup(quit_pygame=False)
except Exception:
return None
finally:
sys.stdout = old_stdout
sys.stderr = old_stderr
times_arr = np.array(times)
return BenchmarkResult(
name=f"effect_{effect_class.__name__}_with_{display.__class__.__name__}",
display=display.__class__.__name__,
effect=effect_class.__name__,
iterations=iterations,
total_time_ms=sum(times),
avg_time_ms=float(np.mean(times_arr)),
std_dev_ms=float(np.std(times_arr)),
min_ms=float(np.min(times_arr)),
max_ms=float(np.max(times_arr)),
fps=float(1000.0 / np.mean(times_arr)) if np.mean(times_arr) > 0 else 0.0,
chars_processed=chars * iterations,
chars_per_sec=float((chars * iterations) / (sum(times) / 1000))
if sum(times) > 0
else 0.0,
)
def get_available_displays():
"""Get available display classes."""
from engine.display import (
DisplayRegistry,
NullDisplay,
TerminalDisplay,
)
DisplayRegistry.initialize()
displays = [
("null", NullDisplay),
("terminal", TerminalDisplay),
]
try:
from engine.display.backends.websocket import WebSocketDisplay
displays.append(("websocket", WebSocketDisplay))
except Exception:
pass
try:
from engine.display.backends.sixel import SixelDisplay
displays.append(("sixel", SixelDisplay))
except Exception:
pass
try:
from engine.display.backends.pygame import PygameDisplay
displays.append(("pygame", PygameDisplay))
except Exception:
pass
return displays
def get_available_effects():
"""Get available effect classes."""
try:
from engine.effects import get_registry
try:
from effects_plugins import discover_plugins
discover_plugins()
except Exception:
pass
except Exception:
return []
effects = []
registry = get_registry()
for name, effect in registry.list_all().items():
if effect:
effect_cls = type(effect)
effects.append((name, effect_cls))
return effects
def run_benchmarks(
displays: list[tuple[str, Any]] | None = None,
effects: list[tuple[str, Any]] | None = None,
iterations: int = 100,
verbose: bool = False,
) -> BenchmarkReport:
"""Run all benchmarks and return report."""
from datetime import datetime
if displays is None:
displays = get_available_displays()
if effects is None:
effects = get_available_effects()
buffer = get_sample_buffer(80, 24)
results = []
if verbose:
print(f"Running benchmarks ({iterations} iterations each)...")
pygame_display = None
for name, display_class in displays:
if verbose:
print(f"Benchmarking display: {name}")
result = benchmark_display(display_class, buffer, iterations)
if result:
results.append(result)
if verbose:
print(f" {result.fps:.1f} FPS, {result.avg_time_ms:.2f}ms avg")
if name == "pygame":
pygame_display = result
if verbose:
print()
pygame_instance = None
if pygame_display:
try:
from engine.display.backends.pygame import PygameDisplay
PygameDisplay.reset_state()
pygame_instance = PygameDisplay()
pygame_instance.init(80, 24, reuse=False)
except Exception:
pygame_instance = None
for effect_name, effect_class in effects:
for display_name, display_class in displays:
if display_name == "websocket":
continue
if display_name == "pygame":
if verbose:
print(f"Benchmarking effect: {effect_name} with {display_name}")
if pygame_instance:
result = benchmark_effect_with_display(
effect_class, pygame_instance, buffer, iterations, reuse=True
)
if result:
results.append(result)
if verbose:
print(
f" {result.fps:.1f} FPS, {result.avg_time_ms:.2f}ms avg"
)
continue
if verbose:
print(f"Benchmarking effect: {effect_name} with {display_name}")
display = display_class()
display.init(80, 24)
result = benchmark_effect_with_display(
effect_class, display, buffer, iterations
)
if result:
results.append(result)
if verbose:
print(f" {result.fps:.1f} FPS, {result.avg_time_ms:.2f}ms avg")
if pygame_instance:
try:
pygame_instance.cleanup(quit_pygame=True)
except Exception:
pass
summary = generate_summary(results)
return BenchmarkReport(
timestamp=datetime.now().isoformat(),
python_version=sys.version,
results=results,
summary=summary,
)
def generate_summary(results: list[BenchmarkResult]) -> dict[str, Any]:
"""Generate summary statistics from results."""
by_display: dict[str, list[BenchmarkResult]] = {}
by_effect: dict[str, list[BenchmarkResult]] = {}
for r in results:
if r.display not in by_display:
by_display[r.display] = []
by_display[r.display].append(r)
if r.effect:
if r.effect not in by_effect:
by_effect[r.effect] = []
by_effect[r.effect].append(r)
summary = {
"by_display": {},
"by_effect": {},
"overall": {
"total_tests": len(results),
"displays_tested": len(by_display),
"effects_tested": len(by_effect),
},
}
for display, res in by_display.items():
fps_values = [r.fps for r in res]
summary["by_display"][display] = {
"avg_fps": float(np.mean(fps_values)),
"min_fps": float(np.min(fps_values)),
"max_fps": float(np.max(fps_values)),
"tests": len(res),
}
for effect, res in by_effect.items():
fps_values = [r.fps for r in res]
summary["by_effect"][effect] = {
"avg_fps": float(np.mean(fps_values)),
"min_fps": float(np.min(fps_values)),
"max_fps": float(np.max(fps_values)),
"tests": len(res),
}
return summary
DEFAULT_CACHE_PATH = Path.home() / ".mainline_benchmark_cache.json"
def load_baseline(cache_path: Path | None = None) -> dict[str, Any] | None:
"""Load baseline benchmark results from cache."""
path = cache_path or DEFAULT_CACHE_PATH
if not path.exists():
return None
try:
with open(path) as f:
return json.load(f)
except Exception:
return None
def save_baseline(
results: list[BenchmarkResult],
cache_path: Path | None = None,
) -> None:
"""Save benchmark results as baseline to cache."""
path = cache_path or DEFAULT_CACHE_PATH
baseline = {
"timestamp": datetime.now().isoformat(),
"results": {
r.name: {
"fps": r.fps,
"avg_time_ms": r.avg_time_ms,
"chars_per_sec": r.chars_per_sec,
}
for r in results
},
}
path.parent.mkdir(parents=True, exist_ok=True)
with open(path, "w") as f:
json.dump(baseline, f, indent=2)
def compare_with_baseline(
results: list[BenchmarkResult],
baseline: dict[str, Any],
threshold: float = 0.2,
verbose: bool = True,
) -> tuple[bool, list[str]]:
"""Compare current results with baseline. Returns (pass, messages)."""
baseline_results = baseline.get("results", {})
failures = []
warnings = []
for r in results:
if r.name not in baseline_results:
warnings.append(f"New test: {r.name} (no baseline)")
continue
b = baseline_results[r.name]
if b["fps"] == 0:
continue
degradation = (b["fps"] - r.fps) / b["fps"]
if degradation > threshold:
failures.append(
f"{r.name}: FPS degraded {degradation * 100:.1f}% "
f"(baseline: {b['fps']:.1f}, current: {r.fps:.1f})"
)
elif verbose:
print(f" {r.name}: {r.fps:.1f} FPS (baseline: {b['fps']:.1f})")
passed = len(failures) == 0
messages = []
if failures:
messages.extend(failures)
if warnings:
messages.extend(warnings)
return passed, messages
def run_hook_mode(
displays: list[tuple[str, Any]] | None = None,
effects: list[tuple[str, Any]] | None = None,
iterations: int = 20,
threshold: float = 0.2,
cache_path: Path | None = None,
verbose: bool = False,
) -> int:
"""Run in hook mode: compare against baseline, exit 0 on pass, 1 on fail."""
baseline = load_baseline(cache_path)
if baseline is None:
print("No baseline found. Run with --baseline to create one.")
return 1
report = run_benchmarks(displays, effects, iterations, verbose)
passed, messages = compare_with_baseline(
report.results, baseline, threshold, verbose
)
print("\n=== Benchmark Hook Results ===")
if passed:
print("PASSED - No significant performance degradation")
return 0
else:
print("FAILED - Performance degradation detected:")
for msg in messages:
print(f" - {msg}")
return 1
def format_report_text(report: BenchmarkReport) -> str:
"""Format report as human-readable text."""
lines = [
"# Mainline Performance Benchmark Report",
"",
f"Generated: {report.timestamp}",
f"Python: {report.python_version}",
"",
"## Summary",
"",
f"Total tests: {report.summary['overall']['total_tests']}",
f"Displays tested: {report.summary['overall']['displays_tested']}",
f"Effects tested: {report.summary['overall']['effects_tested']}",
"",
"## By Display",
"",
]
for display, stats in report.summary["by_display"].items():
lines.append(f"### {display}")
lines.append(f"- Avg FPS: {stats['avg_fps']:.1f}")
lines.append(f"- Min FPS: {stats['min_fps']:.1f}")
lines.append(f"- Max FPS: {stats['max_fps']:.1f}")
lines.append(f"- Tests: {stats['tests']}")
lines.append("")
if report.summary["by_effect"]:
lines.append("## By Effect")
lines.append("")
for effect, stats in report.summary["by_effect"].items():
lines.append(f"### {effect}")
lines.append(f"- Avg FPS: {stats['avg_fps']:.1f}")
lines.append(f"- Min FPS: {stats['min_fps']:.1f}")
lines.append(f"- Max FPS: {stats['max_fps']:.1f}")
lines.append(f"- Tests: {stats['tests']}")
lines.append("")
lines.append("## Detailed Results")
lines.append("")
lines.append("| Display | Effect | FPS | Avg ms | StdDev ms | Min ms | Max ms |")
lines.append("|---------|--------|-----|--------|-----------|--------|--------|")
for r in report.results:
effect_col = r.effect if r.effect else "-"
lines.append(
f"| {r.display} | {effect_col} | {r.fps:.1f} | {r.avg_time_ms:.2f} | "
f"{r.std_dev_ms:.2f} | {r.min_ms:.2f} | {r.max_ms:.2f} |"
)
return "\n".join(lines)
def format_report_json(report: BenchmarkReport) -> str:
"""Format report as JSON."""
data = {
"timestamp": report.timestamp,
"python_version": report.python_version,
"summary": report.summary,
"results": [
{
"name": r.name,
"display": r.display,
"effect": r.effect,
"iterations": r.iterations,
"total_time_ms": r.total_time_ms,
"avg_time_ms": r.avg_time_ms,
"std_dev_ms": r.std_dev_ms,
"min_ms": r.min_ms,
"max_ms": r.max_ms,
"fps": r.fps,
"chars_processed": r.chars_processed,
"chars_per_sec": r.chars_per_sec,
}
for r in report.results
],
}
return json.dumps(data, indent=2)
def main():
parser = argparse.ArgumentParser(description="Run mainline benchmarks")
parser.add_argument(
"--displays",
help="Comma-separated list of displays to test (default: all)",
)
parser.add_argument(
"--effects",
help="Comma-separated list of effects to test (default: all)",
)
parser.add_argument(
"--iterations",
type=int,
default=100,
help="Number of iterations per test (default: 100)",
)
parser.add_argument(
"--output",
help="Output file path (default: stdout)",
)
parser.add_argument(
"--format",
choices=["text", "json"],
default="text",
help="Output format (default: text)",
)
parser.add_argument(
"--verbose",
"-v",
action="store_true",
help="Show progress during benchmarking",
)
parser.add_argument(
"--hook",
action="store_true",
help="Run in hook mode: compare against baseline, exit 0 pass, 1 fail",
)
parser.add_argument(
"--baseline",
action="store_true",
help="Save current results as baseline for future hook comparisons",
)
parser.add_argument(
"--threshold",
type=float,
default=0.2,
help="Performance degradation threshold for hook mode (default: 0.2 = 20%%)",
)
parser.add_argument(
"--cache",
type=str,
default=None,
help="Path to baseline cache file (default: ~/.mainline_benchmark_cache.json)",
)
args = parser.parse_args()
cache_path = Path(args.cache) if args.cache else DEFAULT_CACHE_PATH
if args.hook:
displays = None
if args.displays:
display_map = dict(get_available_displays())
displays = [
(name, display_map[name])
for name in args.displays.split(",")
if name in display_map
]
effects = None
if args.effects:
effect_map = dict(get_available_effects())
effects = [
(name, effect_map[name])
for name in args.effects.split(",")
if name in effect_map
]
return run_hook_mode(
displays,
effects,
iterations=args.iterations,
threshold=args.threshold,
cache_path=cache_path,
verbose=args.verbose,
)
displays = None
if args.displays:
display_map = dict(get_available_displays())
displays = [
(name, display_map[name])
for name in args.displays.split(",")
if name in display_map
]
effects = None
if args.effects:
effect_map = dict(get_available_effects())
effects = [
(name, effect_map[name])
for name in args.effects.split(",")
if name in effect_map
]
report = run_benchmarks(displays, effects, args.iterations, args.verbose)
if args.baseline:
save_baseline(report.results, cache_path)
print(f"Baseline saved to {cache_path}")
return 0
if args.format == "json":
output = format_report_json(report)
else:
output = format_report_text(report)
if args.output:
with open(args.output, "w") as f:
f.write(output)
else:
print(output)
return 0
if __name__ == "__main__":
sys.exit(main())