adish-rmr 497dba7aab first commit: checkpoint per multi-device collab

2025-10-21 14:22:27 +02:00

18 KiB

Raw Blame History

Testing Guide - Theory and Best Practices

Introduction
Your Current Approach vs. Test-Driven Development
The Testing Pyramid
Key Concepts
Real-World Testing Workflow
Regression Testing
Code Coverage
Best Practices
Practical Examples
When Should You Write Tests
Getting Started

Introduction

This guide explains the theory and best practices of software testing, specifically for the PIF Compiler project. It moves beyond ad-hoc testing scripts to a comprehensive, automated testing approach.

Your Current Approach vs. Test-Driven Development

What You Do Now (Ad-hoc Scripts):

# test_script.py
from cosing_service import cosing_search

result = cosing_search("WATER", mode="name")
print(result)  # Look at output, check if it looks right

Problems:

❌ Manual checking (is the output correct?)
❌ Not repeatable (you forget what "correct" looks like)
❌ Doesn't catch regressions (future changes break old code)
❌ No documentation (what should the function do?)
❌ Tedious for many functions

The Testing Pyramid

        /\
       /  \  E2E Tests (Few)
      /----\
     /      \ Integration Tests (Some)
    /--------\
   /          \ Unit Tests (Many)
  /____________\

1. Unit Tests (Bottom - Most Important)

Test individual functions in isolation.

Example:

def test_parse_cas_numbers_single():
    """Test parsing a single CAS number."""
    result = parse_cas_numbers(["7732-18-5"])
    assert result == ["7732-18-5"]  # ← Automated check

Benefits:

✅ Fast (milliseconds)
✅ No external dependencies (no API, no database)
✅ Pinpoint exact problem
✅ Run hundreds in seconds

When to use:

Testing individual functions
Testing data parsing/validation
Testing business logic calculations

2. Integration Tests (Middle)

Test multiple components working together.

Example:

def test_full_cosing_workflow():
    """Test search + clean workflow."""
    raw = cosing_search("WATER", mode="name")
    clean = clean_cosing(raw)
    assert "cosingUrl" in clean

Benefits:

✅ Tests real interactions
✅ Catches integration bugs

Drawbacks:

⚠️ Slower (hits real APIs)
⚠️ Requires internet/database

When to use:

Testing workflows across multiple services
Testing API integrations
Testing database interactions

3. E2E Tests (End-to-End - Top - Fewest)

Test entire application flow (UI → Backend → Database).

Example:

def test_create_pif_from_ui():
    """User creates PIF through Streamlit UI."""
    # Click buttons, fill forms, verify PDF generated

When to use:

Testing complete user workflows
Smoke tests before deployment
Critical business processes

Key Concepts

1. Assertions - Automated Verification

Old way (manual):

result = parse_cas_numbers(["7732-18-5/56-81-5"])
print(result)  # You look at: ['7732-18-5', '56-81-5']
# Is this right? Maybe? You forget in 2 weeks.

Test way (automated):

def test_parse_multiple_cas():
    result = parse_cas_numbers(["7732-18-5/56-81-5"])
    assert result == ["7732-18-5", "56-81-5"]  # ← Computer checks!
    # If wrong, test FAILS immediately

Common Assertions:

# Equality
assert result == expected

# Truthiness
assert result is not None
assert "key" in result

# Exceptions
with pytest.raises(ValueError):
    invalid_function()

# Approximate equality (for floats)
assert result == pytest.approx(3.14159, rel=1e-5)

2. Mocking - Control External Dependencies

Problem: Testing cosing_search() hits the real COSING API:

⚠️ Slow (network request)
⚠️ Unreliable (API might be down)
⚠️ Expensive (rate limits)
⚠️ Hard to test errors (how do you make API return error?)

Solution: Mock it!

from unittest.mock import Mock, patch

@patch('cosing_service.req.post')  # Replace real HTTP request
def test_search_by_name(mock_post):
    # Control what the "API" returns
    mock_response = Mock()
    mock_response.json.return_value = {
        "results": [{"metadata": {"inciName": ["WATER"]}}]
    }
    mock_post.return_value = mock_response

    result = cosing_search("WATER", mode="name")

    assert result["inciName"] == ["WATER"]  # ← Test your logic, not the API
    mock_post.assert_called_once()  # Verify it was called

Benefits:

✅ Fast (no real network)
✅ Reliable (always works)
✅ Can test error cases (mock API failures)
✅ Isolate your code from external issues

What to mock:

HTTP requests (requests.get, requests.post)
Database calls (db.find_one, db.insert)
File I/O (open, read, write)
External APIs (COSING, ECHA, PubChem)
Time-dependent functions (datetime.now())

3. Fixtures - Reusable Test Data

Without fixtures (repetitive):

def test_clean_basic():
    data = {"inciName": ["WATER"], "casNo": ["7732-18-5"], ...}
    result = clean_cosing(data)
    assert ...

def test_clean_empty():
    data = {"inciName": ["WATER"], "casNo": ["7732-18-5"], ...}  # Copy-paste!
    result = clean_cosing(data)
    assert ...

With fixtures (DRY - Don't Repeat Yourself):

# conftest.py
@pytest.fixture
def sample_cosing_response():
    """Reusable COSING response data."""
    return {
        "inciName": ["WATER"],
        "casNo": ["7732-18-5"],
        "substanceId": ["12345"]
    }

# test file
def test_clean_basic(sample_cosing_response):  # Auto-injected!
    result = clean_cosing(sample_cosing_response)
    assert result["inciName"] == "WATER"

def test_clean_empty(sample_cosing_response):  # Reuse same data!
    result = clean_cosing(sample_cosing_response)
    assert "cosingUrl" in result

Benefits:

✅ No code duplication
✅ Centralized test data
✅ Easy to update (change once, affects all tests)
✅ Auto-cleanup (fixtures can tear down resources)

Common fixture patterns:

# Database fixture with cleanup
@pytest.fixture
def test_db():
    db = connect_to_test_db()
    yield db  # Test runs here
    db.drop_all()  # Cleanup after test

# Temporary file fixture
@pytest.fixture
def temp_file(tmp_path):
    file_path = tmp_path / "test.json"
    file_path.write_text('{"test": "data"}')
    return file_path  # Auto-cleaned by pytest

Real-World Testing Workflow

Scenario: You Add a New Feature

Step 1: Write the test FIRST (TDD - Test-Driven Development):

def test_parse_cas_removes_parentheses():
    """CAS numbers with parentheses should be cleaned."""
    result = parse_cas_numbers(["7732-18-5 (hydrate)"])
    assert result == ["7732-18-5"]

Step 2: Run test - it FAILS (expected!):

$ uv run pytest tests/test_cosing_service.py::test_parse_cas_removes_parentheses

FAILED: AssertionError: assert ['7732-18-5 (hydrate)'] == ['7732-18-5']

Step 3: Write code to make it pass:

def parse_cas_numbers(cas_string: list) -> list:
    cas_string = cas_string[0]
    cas_string = re.sub(r"\([^)]*\)", "", cas_string)  # ← Add this
    # ... rest of function

Step 4: Run test again - it PASSES:

$ uv run pytest tests/test_cosing_service.py::test_parse_cas_removes_parentheses

PASSED ✓

Step 5: Refactor if needed - tests ensure you don't break anything!

TDD Cycle (Red-Green-Refactor)

1. RED:    Write failing test
     ↓
2. GREEN:  Write minimal code to pass
     ↓
3. REFACTOR: Improve code without breaking tests
     ↓
   Repeat

Benefits:

✅ Forces you to think about requirements first
✅ Prevents over-engineering
✅ Built-in documentation (tests show intended behavior)
✅ Confidence to refactor

Regression Testing - The Killer Feature

Scenario: You change code 6 months later:

# Original (working)
def parse_cas_numbers(cas_string: list) -> list:
    cas_string = cas_string[0]
    cas_string = re.sub(r"\([^)]*\)", "", cas_string)
    cas_parts = re.split(r"[/;,]", cas_string)  # Handles /, ;, ,
    return [cas.strip() for cas in cas_parts]

# You "improve" it
def parse_cas_numbers(cas_string: list) -> list:
    return cas_string[0].split("/")  # Simpler! But...

Run tests:

$ uv run pytest

FAILED: test_multiple_cas_with_semicolon
Expected: ['7732-18-5', '56-81-5']
Got: ['7732-18-5;56-81-5']  # ← Oops, broke semicolon support!

FAILED: test_cas_with_parentheses
Expected: ['7732-18-5']
Got: ['7732-18-5 (hydrate)']  # ← Broke parentheses removal!

Without tests:

You deploy
Users report bugs
You're confused what broke
Spend hours debugging

With tests:

Instant feedback
Fix before deploying
Save hours of debugging

Coverage - How Much Is Tested?

Running Coverage

uv run pytest --cov=src/pif_compiler --cov-report=html

Sample Output

Name                           Stmts   Miss  Cover
--------------------------------------------------
cosing_service.py                 89      5    94%
echa_service.py                  156     89    43%
models.py                         45     45     0%
--------------------------------------------------
TOTAL                            290    139    52%

Interpretation

✅ cosing_service.py - 94% covered (great!)
⚠️ echa_service.py - 43% covered (needs more tests)
❌ models.py - 0% covered (no tests yet)

Coverage Goals

Coverage	Status	Action
90-100%	✅ Excellent	Maintain
70-90%	⚠️ Good	Add edge cases
50-70%	⚠️ Acceptable	Prioritize critical paths
<50%	❌ Poor	Add tests immediately

Target: 80%+ for business-critical code

HTML Coverage Report

uv run pytest --cov=src/pif_compiler --cov-report=html
# Open htmlcov/index.html in browser

Shows:

Which lines are tested (green)
Which lines are not tested (red)
Which branches are not covered

Best Practices Summary

✅ DO:

Write tests for all business logic

# YES: Test calculations
def test_sed_calculation():
    ingredient = Ingredient(quantity=10.0, dap=0.5)
    assert ingredient.calculate_sed() == 5.0

Mock external dependencies

# YES: Mock API calls
@patch('cosing_service.req.post')
def test_search(mock_post):
    mock_post.return_value.json.return_value = {...}

Test edge cases

# YES: Test edge cases
def test_parse_empty_cas():
    assert parse_cas_numbers([""]) == []

def test_parse_invalid_cas():
    with pytest.raises(ValueError):
        parse_cas_numbers(["abc-def-ghi"])

Keep tests simple

# YES: One test = one thing
def test_cas_removes_whitespace():
    assert parse_cas_numbers(["  123-45-6  "]) == ["123-45-6"]

# NO: Testing multiple things
def test_cas_everything():
    assert parse_cas_numbers(["  123-45-6  "]) == ["123-45-6"]
    assert parse_cas_numbers(["123-45-6/789-01-2"]) == [...]
    # Too much in one test!

Run tests before committing

git add .
uv run pytest  # ← Always run first!
git commit -m "Add feature X"

Use descriptive test names

# YES: Describes what it tests
def test_parse_cas_removes_parenthetical_info():
    ...

# NO: Vague
def test_cas_1():
    ...

❌ DON'T:

Don't test external libraries

# NO: Testing if requests.post works
def test_requests_library():
    response = requests.post("https://example.com")
    assert response.status_code == 200

# YES: Test YOUR code that uses requests
@patch('requests.post')
def test_my_search_function(mock_post):
    ...

Don't make tests dependent on each other

# NO: test_b depends on test_a
def test_a_creates_data():
    db.insert({"id": 1, "name": "test"})

def test_b_uses_data():
    data = db.find_one({"id": 1})  # Breaks if test_a fails!

# YES: Each test is independent
def test_b_uses_data():
    db.insert({"id": 1, "name": "test"})  # Create own data
    data = db.find_one({"id": 1})

Don't test implementation details

# NO: Testing internal variable names
def test_internal_state():
    obj = MyClass()
    assert obj._internal_var == "value"  # Breaks with refactoring

# YES: Test public behavior
def test_public_api():
    obj = MyClass()
    assert obj.get_value() == "value"

Don't skip tests

# NO: Commenting out failing tests
# def test_broken_feature():
#     assert broken_function() == "expected"

# YES: Fix the test or mark as TODO
@pytest.mark.skip(reason="Feature not implemented yet")
def test_future_feature():
    ...

Practical Example: Your Workflow

Before (Manual Script)

# test_water.py
from cosing_service import cosing_search, clean_cosing

result = cosing_search("WATER", "name")
print(result)  # ← You manually check

clean = clean_cosing(result)
print(clean)  # ← You manually check again

# Run 10 times with different inputs... tedious!

Problems:

Manual verification
Slow (type command, read output, verify)
Error-prone (miss things)
Not repeatable

After (Automated Tests)

# tests/test_cosing_service.py
def test_search_and_clean_water():
    """Water should be searchable and cleanable."""
    result = cosing_search("WATER", "name")
    assert result is not None
    assert "inciName" in result

    clean = clean_cosing(result)
    assert clean["inciName"] == "WATER"
    assert "cosingUrl" in clean

# Run ONCE: pytest
# It checks everything automatically!

Run all 25 tests:

$ uv run pytest

tests/test_cosing_service.py::TestParseCasNumbers::test_single_cas_number PASSED
tests/test_cosing_service.py::TestParseCasNumbers::test_multiple_cas_with_slash PASSED
...
======================== 25 passed in 0.5s ========================

Benefits:

✅ All pass? Safe to deploy!
❌ One fails? Fix before deploying!
⏱️ 25 tests in 0.5 seconds vs. manual testing for 30 minutes

When Should You Write Tests?

Always Test:

✅ Business logic (calculations, data processing)

# YES
def test_calculate_sed():
    assert calculate_sed(quantity=10, dap=0.5) == 5.0

✅ Data validation (Pydantic models)

# YES
def test_ingredient_validates_cas_format():
    with pytest.raises(ValidationError):
        Ingredient(cas="invalid", quantity=10.0)

✅ API integrations (with mocks)

# YES
@patch('requests.post')
def test_cosing_search(mock_post):
    ...

✅ Bug fixes (write test first, then fix)

# YES
def test_bug_123_empty_cas_crash():
    """Regression test for bug #123."""
    result = parse_cas_numbers([])  # Used to crash
    assert result == []

Sometimes Test:

⚠️ UI code (harder to test, less critical)

# Streamlit UI tests are complex, lower priority

⚠️ Configuration (usually simple)

# Config loading is straightforward, test if complex logic

Don't Test:

❌ Third-party libraries (they have their own tests)

# NO: Testing if pandas works
def test_pandas_dataframe():
    df = pd.DataFrame({"a": [1, 2, 3]})
    assert len(df) == 3  # Pandas team already tested this!

❌ Trivial code

# NO: Testing simple getters/setters
class MyClass:
    def get_name(self):
        return self.name  # Too simple to test

Your Next Steps

1. Install Pytest

cd c:\Users\adish\Projects\pif_compiler
uv add --dev pytest pytest-cov pytest-mock

2. Run the COSING Tests

# Run all tests
uv run pytest

# Run with verbose output
uv run pytest -v

# Run specific test file
uv run pytest tests/test_cosing_service.py

# Run specific test
uv run pytest tests/test_cosing_service.py::TestParseCasNumbers::test_single_cas_number

3. See Coverage

# Terminal report
uv run pytest --cov=src/pif_compiler/services/cosing_service

# HTML report (more detailed)
uv run pytest --cov=src/pif_compiler --cov-report=html
# Open htmlcov/index.html in browser

4. Start Writing Tests for New Code

Follow the TDD cycle:

Red: Write failing test
Green: Write minimal code to pass
Refactor: Improve code
Repeat!

Additional Resources

Pytest Documentation

Testing Philosophy

PIF Compiler Specific

tests/README.md - Test suite documentation
tests/RUN_TESTS.md - Quick start guide
REFACTORING.md - Code organization changes

Summary

Testing transforms your development workflow:

Without Tests	With Tests
Manual verification	Automated checks
Slow feedback	Instant feedback
Fear of breaking things	Confidence to refactor
Undocumented behavior	Tests as documentation
Debug for hours	Pinpoint issues immediately

Start small:

Write tests for one service (✅ COSING done!)
Add tests for new features
Fix bugs with tests first
Gradually increase coverage

The investment pays off:

Fewer bugs in production
Faster development (less debugging)
Better code design
Easier collaboration
Peace of mind 😌

Last updated: 2025-01-04

18 KiB Raw Blame History

Testing Guide - Theory and Best Practices

Table of Contents

Introduction

Your Current Approach vs. Test-Driven Development

What You Do Now (Ad-hoc Scripts):

The Testing Pyramid

1. Unit Tests (Bottom - Most Important)

2. Integration Tests (Middle)

3. E2E Tests (End-to-End - Top - Fewest)

Key Concepts

1. Assertions - Automated Verification

2. Mocking - Control External Dependencies

3. Fixtures - Reusable Test Data

Real-World Testing Workflow

Scenario: You Add a New Feature

TDD Cycle (Red-Green-Refactor)

Regression Testing - The Killer Feature

Coverage - How Much Is Tested?

Running Coverage

Sample Output

Interpretation

Coverage Goals

HTML Coverage Report

Best Practices Summary

✅ DO:

❌ DON'T:

Practical Example: Your Workflow

Before (Manual Script)

After (Automated Tests)

When Should You Write Tests?

Always Test:

Sometimes Test:

Don't Test:

Your Next Steps

1. Install Pytest

2. Run the COSING Tests

3. See Coverage

4. Start Writing Tests for New Code

Additional Resources

Pytest Documentation

Testing Philosophy

PIF Compiler Specific

Summary

18 KiB

Raw Blame History