Symphonic-Joules

Test Performance Optimization Guide

This guide provides best practices for writing performant tests in the Symphonic-Joules project, ensuring fast and efficient test execution as the codebase grows.

🎯 Overview

Test performance is crucial for developer productivity. Slow tests discourage running them frequently, which can lead to bugs slipping through. This guide demonstrates techniques to keep tests fast while maintaining reliability.

🚀 Key Principles

1. Use Appropriate Fixture Scopes

Pytest fixtures support different scopes that determine how often they’re executed:

Best Practice: Use the widest scope possible for expensive operations without compromising test independence.

Example: Optimizing File I/O

# ❌ BAD: File read on every test (function scope)
@pytest.fixture
def config_data():
    with open('config.yml', 'r') as f:
        return yaml.safe_load(f)

# ✅ GOOD: File read once per module
@pytest.fixture(scope='module')
def config_data():
    with open('config.yml', 'r') as f:
        return yaml.safe_load(f)

Impact: If you have 50 tests reading the same file, module scope reduces I/O from 50 operations to 1.

2. Cache Expensive Computations

For operations that are expensive but produce immutable results, use module or session-scoped fixtures.

Example: YAML Parsing

# ❌ BAD: Parse YAML in each test class
class TestWorkflowConfig:
    @pytest.fixture
    def workflow(self):
        with open('workflow.yml', 'r') as f:
            return yaml.safe_load(f)  # Parsed every time
    
    def test_name(self, workflow):
        assert workflow['name'] == 'CI'

# ✅ GOOD: Parse YAML once per module
@pytest.fixture(scope='module')
def workflow():
    with open('workflow.yml', 'r') as f:
        return yaml.safe_load(f)  # Parsed once

class TestWorkflowConfig:
    def test_name(self, workflow):
        assert workflow['name'] == 'CI'

3. Avoid Repeated Path Construction

Computing file paths repeatedly wastes CPU cycles.

# ❌ BAD: Reconstruct path in every fixture
class TestFiles:
    @pytest.fixture
    def file_a(self):
        path = Path(__file__).parent.parent / 'data' / 'file_a.txt'
        return path.read_text()
    
    @pytest.fixture
    def file_b(self):
        path = Path(__file__).parent.parent / 'data' / 'file_b.txt'
        return path.read_text()

# ✅ GOOD: Compute base path once
@pytest.fixture(scope='module')
def data_dir():
    return Path(__file__).parent.parent / 'data'

class TestFiles:
    @pytest.fixture
    def file_a(self, data_dir):
        return (data_dir / 'file_a.txt').read_text()
    
    @pytest.fixture
    def file_b(self, data_dir):
        return (data_dir / 'file_b.txt').read_text()

4. Optimize String Operations

String operations can be surprisingly expensive, especially on large strings.

Avoid Unnecessary String Transformations

# ❌ BAD: Convert entire file to lowercase
def test_contains_keyword(file_content):
    assert 'main' in file_content.lower()  # Processes entire string

# ✅ GOOD: Check multiple variants
def test_contains_keyword(file_content):
    assert 'main' in file_content or 'Main' in file_content or 'MAIN' in file_content

Use Efficient String Methods

# ❌ BAD: Multiple passes through string
def test_no_secrets(content):
    for pattern in ['password', 'token', 'api_key']:
        if pattern in content.lower():
            pytest.fail(f"Found {pattern}")

# ✅ GOOD: Single lowercase conversion
def test_no_secrets(content):
    content_lower = content.lower()
    for pattern in ['password', 'token', 'api_key']:
        if pattern in content_lower:
            pytest.fail(f"Found {pattern}")

5. Share Immutable Data Across Tests

If test data is immutable (read-only), it’s safe to share across tests using module or session scopes.

# ✅ GOOD: Share immutable parsed data
@pytest.fixture(scope='module')
def audio_metadata():
    """Load audio file metadata once for all tests."""
    return load_audio_metadata('test_audio.wav')

class TestAudioMetadata:
    def test_sample_rate(self, audio_metadata):
        assert audio_metadata['sample_rate'] == 44100
    
    def test_channels(self, audio_metadata):
        assert audio_metadata['channels'] == 2
    
    def test_duration(self, audio_metadata):
        assert audio_metadata['duration'] > 0

Warning: Don’t share mutable data without careful consideration, as it can lead to test interdependence.

6. Use Generators for Large Datasets

When testing with large datasets, use generators to avoid loading everything into memory.

# ❌ BAD: Load all test files into memory
@pytest.fixture
def all_audio_files():
    return [load_audio(f) for f in get_audio_files()]

# ✅ GOOD: Yield one file at a time
@pytest.fixture
def audio_file_generator():
    for file_path in get_audio_files():
        yield load_audio(file_path)

📊 Performance Monitoring

Measuring Test Execution Time

Use pytest’s duration reporting to identify slow tests:

# Show slowest 10 tests
pytest --durations=10

# Show all test durations
pytest --durations=0

Profiling Tests

For detailed performance analysis:

# Profile with pytest-profiling
pip install pytest-profiling
pytest --profile

# Profile with cProfile
python -m cProfile -o output.prof -m pytest tests/

🔍 Real-World Example

Here’s a before/after comparison from the Symphonic-Joules test suite:

Before Optimization

class TestWorkflowStructure:
    @pytest.fixture
    def workflow_path(self):
        repo_root = Path(__file__).parent.parent.parent
        return repo_root / '.github' / 'workflows' / 'blank.yml'
    
    @pytest.fixture
    def workflow_content(self, workflow_path):
        with open(workflow_path, 'r') as f:
            return yaml.safe_load(f)
    
    def test_workflow_exists(self, workflow_path):
        assert workflow_path.exists()
    
    def test_workflow_has_name(self, workflow_content):
        assert 'name' in workflow_content

class TestWorkflowMetadata:
    @pytest.fixture
    def workflow_content(self):
        workflow_path = Path(__file__).parent.parent.parent / '.github' / 'workflows' / 'blank.yml'
        with open(workflow_path, 'r') as f:
            return yaml.safe_load(f)
    
    def test_name_is_ci(self, workflow_content):
        assert workflow_content['name'] == 'CI'

Problems:

After Optimization

# Module-level fixtures (computed once)
@pytest.fixture(scope='module')
def workflow_path():
    """Module-scoped fixture for workflow file path."""
    repo_root = Path(__file__).parent.parent.parent
    return repo_root / '.github' / 'workflows' / 'blank.yml'

@pytest.fixture(scope='module')
def workflow_content(workflow_path):
    """Module-scoped fixture for parsed workflow content."""
    with open(workflow_path, 'r') as f:
        return yaml.safe_load(f)

class TestWorkflowStructure:
    def test_workflow_exists(self, workflow_path):
        assert workflow_path.exists()
    
    def test_workflow_has_name(self, workflow_content):
        assert 'name' in workflow_content

class TestWorkflowMetadata:
    def test_name_is_ci(self, workflow_content):
        assert workflow_content['name'] == 'CI'

Results:

⚠️ Important Considerations

When NOT to Use Wide-Scoped Fixtures

Don’t use module or session scope if:

  1. Test modifies the data: Mutable data should be per-test to avoid interdependence
  2. Test has side effects: Database writes, file modifications, etc.
  3. Random data is needed: Each test should get fresh random data
  4. Setup is lightweight: Caching a simple operation may add complexity for no benefit

Maintaining Test Independence

Wide-scoped fixtures are safe when:

# ✅ SAFE: Immutable data
@pytest.fixture(scope='module')
def config():
    return {'timeout': 30, 'retries': 3}  # Never modified

# ❌ UNSAFE: Mutable data
@pytest.fixture(scope='module')
def shared_list():
    return []  # Tests might append to it

📈 Expected Performance Gains

Based on the optimizations implemented in this project:

Optimization Impact Typical Reduction
Module-scoped file I/O High 80-95% fewer file reads
Module-scoped parsing High 80-95% fewer parse operations
Path caching Medium 50-70% fewer path constructions
String optimization Low-Medium 10-30% faster string operations
Generator usage High (memory) 70-95% less memory for large datasets

🎯 Best Practices Summary

Do’s ✅

  1. Use module/session scope for expensive, immutable operations
  2. Cache file I/O operations with appropriate fixture scopes
  3. Share read-only data across tests
  4. Profile tests regularly to identify bottlenecks
  5. Optimize string operations to avoid unnecessary transformations
  6. Use generators for large datasets
  7. Document why you chose specific fixture scopes

Don’ts ❌

  1. Don’t share mutable data across tests
  2. Don’t cache operations with side effects
  3. Don’t optimize without measuring first
  4. Don’t sacrifice test clarity for minor performance gains
  5. Don’t use wide scopes for random or time-dependent data
  6. Don’t forget test independence

📚 Additional Resources


Fast tests = Happy developers