Improve benchmarking system and performance tests #34

New Issue

david · 2026-03-18T23:55:20Z

david commented

2026-03-18 23:55:20 +00:00

User Story

As a developer, I want to have reliable and accurate performance benchmarks that can run consistently across different environments (development, CI with coverage, etc.), so that I can detect performance regressions without false positives from coverage instrumentation or timing variations.

Rationale

The current benchmark tests are failing intermittently when running with coverage instrumentation:

FAILED tests/test_benchmark.py::TestBenchmarkNullDisplay::test_effects_minimum_throughput

This occurs because:

Coverage instrumentation significantly slows down Python execution (typically 2-5x slower)
The minimum FPS thresholds (10,000 for effects) are not adjusted for coverage runs
Timing-sensitive tests are unreliable in CI environments with varying load
There's no way to distinguish between "actual performance regression" and "coverage overhead"

Proposed Improvements

Adjust thresholds for coverage mode: Detect if coverage is active and adjust minimum FPS thresholds accordingly
Add environment-aware benchmarks: Different thresholds for CI vs local development
Use statistical methods: Instead of single-run timing, use median of multiple runs with outlier rejection
Add performance baseline tracking: Store historical performance data to detect trends
Make benchmark tests optional: Mark them with a custom marker like @pytest.mark.performance so they can be skipped when not needed
Add configurable thresholds: Allow thresholds to be set via environment variables for different CI environments

Acceptance Criteria

Benchmark tests pass consistently when running with coverage
Performance thresholds are adjusted based on execution environment
Documentation on how to run and interpret benchmarks
CI pipeline includes performance regression detection

Part of broader performance monitoring work
Related to issue #26 (Streaming display backend) which may have performance implications

## User Story As a developer, I want to have reliable and accurate performance benchmarks that can run consistently across different environments (development, CI with coverage, etc.), so that I can detect performance regressions without false positives from coverage instrumentation or timing variations. ## Rationale The current benchmark tests are failing intermittently when running with coverage instrumentation: ``` FAILED tests/test_benchmark.py::TestBenchmarkNullDisplay::test_effects_minimum_throughput ``` This occurs because: 1. Coverage instrumentation significantly slows down Python execution (typically 2-5x slower) 2. The minimum FPS thresholds (10,000 for effects) are not adjusted for coverage runs 3. Timing-sensitive tests are unreliable in CI environments with varying load 4. There's no way to distinguish between "actual performance regression" and "coverage overhead" ## Proposed Improvements 1. **Adjust thresholds for coverage mode**: Detect if coverage is active and adjust minimum FPS thresholds accordingly 2. **Add environment-aware benchmarks**: Different thresholds for CI vs local development 3. **Use statistical methods**: Instead of single-run timing, use median of multiple runs with outlier rejection 4. **Add performance baseline tracking**: Store historical performance data to detect trends 5. **Make benchmark tests optional**: Mark them with a custom marker like `@pytest.mark.performance` so they can be skipped when not needed 6. **Add configurable thresholds**: Allow thresholds to be set via environment variables for different CI environments ## Acceptance Criteria - [ ] Benchmark tests pass consistently when running with coverage - [ ] Performance thresholds are adjusted based on execution environment - [ ] Documentation on how to run and interpret benchmarks - [ ] CI pipeline includes performance regression detection ## Related Issues - Part of broader performance monitoring work - Related to issue #26 (Streaming display backend) which may have performance implications

david referenced this issue

2026-03-19 00:09:20 +00:00

Epic: Pipeline Mutation API for Stage Hot-Swapping #35

david commented

2026-03-19 03:41:50 +00:00

Implementation Plan (2026-03-18)

Phase 1: More Coverage + Better Isolation

Based on our planning, here's what we'll implement:

Add more test coverage:
- Test all display backends performance
- Test effect pipeline execution throughput
- Test WebSocket streaming performance
- Target: 80% coverage for benchmark system
Improve test isolation:
- Better fixtures for benchmark tests
- Mock external dependencies (network, file I/O)
- Use subprocess isolation for timing-critical tests
Fix CI interference:
- Review mise.toml for the duplication issue (CI runs tasks twice)
- Ensure benchmarks don't run during coverage tests
- The test-cov task already uses -m "not benchmark" to exclude benchmarks

Current Issues Found

mise.toml CI task has redundant configuration:
- ci task has both depends list AND run string that repeats the same commands
- This causes tasks to run twice
- Fix: Clean up mise.toml to use either depends OR run, not both

Testing Strategy

Use @pytest.mark.benchmark marker for performance tests
Use @pytest.mark.slow for tests that take >1 second
Run benchmarks in isolated subprocess to avoid coverage interference

Should we proceed with implementing these improvements?

## Implementation Plan (2026-03-18) ### Phase 1: More Coverage + Better Isolation Based on our planning, here's what we'll implement: 1. **Add more test coverage:** - Test all display backends performance - Test effect pipeline execution throughput - Test WebSocket streaming performance - Target: 80% coverage for benchmark system 2. **Improve test isolation:** - Better fixtures for benchmark tests - Mock external dependencies (network, file I/O) - Use subprocess isolation for timing-critical tests 3. **Fix CI interference:** - Review `mise.toml` for the duplication issue (CI runs tasks twice) - Ensure benchmarks don't run during coverage tests - The `test-cov` task already uses `-m "not benchmark"` to exclude benchmarks ### Current Issues Found 1. **`mise.toml` CI task has redundant configuration:** - `ci` task has both `depends` list AND `run` string that repeats the same commands - This causes tasks to run twice - Fix: Clean up `mise.toml` to use either `depends` OR `run`, not both ### Testing Strategy - Use `@pytest.mark.benchmark` marker for performance tests - Use `@pytest.mark.slow` for tests that take >1 second - Run benchmarks in isolated subprocess to avoid coverage interference Should we proceed with implementing these improvements?

david referenced this issue

2026-03-19 05:32:36 +00:00

Bug: PIL-based block height estimation causes severe performance degradation #38

david commented

2026-03-19 05:34:33 +00:00

Progress Update

Implemented in commit 3a2138a:

✅ Coverage-aware FPS thresholds (_is_coverage_active())
✅ Adjusted min_fps to 25% when coverage is active
✅ Reduced iterations in coverage mode
✅ Added BENCHMARK_ITERATIONS env var override
✅ Added separate benchmark task in mise.toml

The transform.py performance fix (c57617b) also addresses the root cause - PIL-based height estimation was causing benchmarks to be slow.

## Progress Update Implemented in commit `3a2138a`: - ✅ Coverage-aware FPS thresholds (`_is_coverage_active()`) - ✅ Adjusted min_fps to 25% when coverage is active - ✅ Reduced iterations in coverage mode - ✅ Added `BENCHMARK_ITERATIONS` env var override - ✅ Added separate `benchmark` task in mise.toml The transform.py performance fix (`c57617b`) also addresses the root cause - PIL-based height estimation was causing benchmarks to be slow.

david closed this issue

2026-03-19 20:13:19 +00:00

david referenced this issue from a commit

2026-03-20 11:42:02 +00:00

Merge pull request 'feat/figment: periodic SVG glyph overlays with CLI flag' (#34) from feat/figment into main

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: klubhaus/sideline#34