cosmoguard-bd/docs/testing_guide.md

18 KiB

Testing Guide - Theory and Best Practices

Table of Contents


Introduction

This guide explains the theory and best practices of software testing, specifically for the PIF Compiler project. It moves beyond ad-hoc testing scripts to a comprehensive, automated testing approach.


Your Current Approach vs. Test-Driven Development

What You Do Now (Ad-hoc Scripts):

# test_script.py
from cosing_service import cosing_search

result = cosing_search("WATER", mode="name")
print(result)  # Look at output, check if it looks right

Problems:

  • Manual checking (is the output correct?)
  • Not repeatable (you forget what "correct" looks like)
  • Doesn't catch regressions (future changes break old code)
  • No documentation (what should the function do?)
  • Tedious for many functions

The Testing Pyramid

        /\
       /  \  E2E Tests (Few)
      /----\
     /      \ Integration Tests (Some)
    /--------\
   /          \ Unit Tests (Many)
  /____________\

1. Unit Tests (Bottom - Most Important)

Test individual functions in isolation.

Example:

def test_parse_cas_numbers_single():
    """Test parsing a single CAS number."""
    result = parse_cas_numbers(["7732-18-5"])
    assert result == ["7732-18-5"]  # ← Automated check

Benefits:

  • Fast (milliseconds)
  • No external dependencies (no API, no database)
  • Pinpoint exact problem
  • Run hundreds in seconds

When to use:

  • Testing individual functions
  • Testing data parsing/validation
  • Testing business logic calculations

2. Integration Tests (Middle)

Test multiple components working together.

Example:

def test_full_cosing_workflow():
    """Test search + clean workflow."""
    raw = cosing_search("WATER", mode="name")
    clean = clean_cosing(raw)
    assert "cosingUrl" in clean

Benefits:

  • Tests real interactions
  • Catches integration bugs

Drawbacks:

  • ⚠️ Slower (hits real APIs)
  • ⚠️ Requires internet/database

When to use:

  • Testing workflows across multiple services
  • Testing API integrations
  • Testing database interactions

3. E2E Tests (End-to-End - Top - Fewest)

Test entire application flow (UI → Backend → Database).

Example:

def test_create_pif_from_ui():
    """User creates PIF through Streamlit UI."""
    # Click buttons, fill forms, verify PDF generated

When to use:

  • Testing complete user workflows
  • Smoke tests before deployment
  • Critical business processes

Key Concepts

1. Assertions - Automated Verification

Old way (manual):

result = parse_cas_numbers(["7732-18-5/56-81-5"])
print(result)  # You look at: ['7732-18-5', '56-81-5']
# Is this right? Maybe? You forget in 2 weeks.

Test way (automated):

def test_parse_multiple_cas():
    result = parse_cas_numbers(["7732-18-5/56-81-5"])
    assert result == ["7732-18-5", "56-81-5"]  # ← Computer checks!
    # If wrong, test FAILS immediately

Common Assertions:

# Equality
assert result == expected

# Truthiness
assert result is not None
assert "key" in result

# Exceptions
with pytest.raises(ValueError):
    invalid_function()

# Approximate equality (for floats)
assert result == pytest.approx(3.14159, rel=1e-5)

2. Mocking - Control External Dependencies

Problem: Testing cosing_search() hits the real COSING API:

  • ⚠️ Slow (network request)
  • ⚠️ Unreliable (API might be down)
  • ⚠️ Expensive (rate limits)
  • ⚠️ Hard to test errors (how do you make API return error?)

Solution: Mock it!

from unittest.mock import Mock, patch

@patch('cosing_service.req.post')  # Replace real HTTP request
def test_search_by_name(mock_post):
    # Control what the "API" returns
    mock_response = Mock()
    mock_response.json.return_value = {
        "results": [{"metadata": {"inciName": ["WATER"]}}]
    }
    mock_post.return_value = mock_response

    result = cosing_search("WATER", mode="name")

    assert result["inciName"] == ["WATER"]  # ← Test your logic, not the API
    mock_post.assert_called_once()  # Verify it was called

Benefits:

  • Fast (no real network)
  • Reliable (always works)
  • Can test error cases (mock API failures)
  • Isolate your code from external issues

What to mock:

  • HTTP requests (requests.get, requests.post)
  • Database calls (db.find_one, db.insert)
  • File I/O (open, read, write)
  • External APIs (COSING, ECHA, PubChem)
  • Time-dependent functions (datetime.now())

3. Fixtures - Reusable Test Data

Without fixtures (repetitive):

def test_clean_basic():
    data = {"inciName": ["WATER"], "casNo": ["7732-18-5"], ...}
    result = clean_cosing(data)
    assert ...

def test_clean_empty():
    data = {"inciName": ["WATER"], "casNo": ["7732-18-5"], ...}  # Copy-paste!
    result = clean_cosing(data)
    assert ...

With fixtures (DRY - Don't Repeat Yourself):

# conftest.py
@pytest.fixture
def sample_cosing_response():
    """Reusable COSING response data."""
    return {
        "inciName": ["WATER"],
        "casNo": ["7732-18-5"],
        "substanceId": ["12345"]
    }

# test file
def test_clean_basic(sample_cosing_response):  # Auto-injected!
    result = clean_cosing(sample_cosing_response)
    assert result["inciName"] == "WATER"

def test_clean_empty(sample_cosing_response):  # Reuse same data!
    result = clean_cosing(sample_cosing_response)
    assert "cosingUrl" in result

Benefits:

  • No code duplication
  • Centralized test data
  • Easy to update (change once, affects all tests)
  • Auto-cleanup (fixtures can tear down resources)

Common fixture patterns:

# Database fixture with cleanup
@pytest.fixture
def test_db():
    db = connect_to_test_db()
    yield db  # Test runs here
    db.drop_all()  # Cleanup after test

# Temporary file fixture
@pytest.fixture
def temp_file(tmp_path):
    file_path = tmp_path / "test.json"
    file_path.write_text('{"test": "data"}')
    return file_path  # Auto-cleaned by pytest

Real-World Testing Workflow

Scenario: You Add a New Feature

Step 1: Write the test FIRST (TDD - Test-Driven Development):

def test_parse_cas_removes_parentheses():
    """CAS numbers with parentheses should be cleaned."""
    result = parse_cas_numbers(["7732-18-5 (hydrate)"])
    assert result == ["7732-18-5"]

Step 2: Run test - it FAILS (expected!):

$ uv run pytest tests/test_cosing_service.py::test_parse_cas_removes_parentheses

FAILED: AssertionError: assert ['7732-18-5 (hydrate)'] == ['7732-18-5']

Step 3: Write code to make it pass:

def parse_cas_numbers(cas_string: list) -> list:
    cas_string = cas_string[0]
    cas_string = re.sub(r"\([^)]*\)", "", cas_string)  # ← Add this
    # ... rest of function

Step 4: Run test again - it PASSES:

$ uv run pytest tests/test_cosing_service.py::test_parse_cas_removes_parentheses

PASSED ✓

Step 5: Refactor if needed - tests ensure you don't break anything!


TDD Cycle (Red-Green-Refactor)

1. RED:    Write failing test
     ↓
2. GREEN:  Write minimal code to pass
     ↓
3. REFACTOR: Improve code without breaking tests
     ↓
   Repeat

Benefits:

  • Forces you to think about requirements first
  • Prevents over-engineering
  • Built-in documentation (tests show intended behavior)
  • Confidence to refactor

Regression Testing - The Killer Feature

Scenario: You change code 6 months later:

# Original (working)
def parse_cas_numbers(cas_string: list) -> list:
    cas_string = cas_string[0]
    cas_string = re.sub(r"\([^)]*\)", "", cas_string)
    cas_parts = re.split(r"[/;,]", cas_string)  # Handles /, ;, ,
    return [cas.strip() for cas in cas_parts]

# You "improve" it
def parse_cas_numbers(cas_string: list) -> list:
    return cas_string[0].split("/")  # Simpler! But...

Run tests:

$ uv run pytest

FAILED: test_multiple_cas_with_semicolon
Expected: ['7732-18-5', '56-81-5']
Got: ['7732-18-5;56-81-5']  # ← Oops, broke semicolon support!

FAILED: test_cas_with_parentheses
Expected: ['7732-18-5']
Got: ['7732-18-5 (hydrate)']  # ← Broke parentheses removal!

Without tests:

  • You deploy
  • Users report bugs
  • You're confused what broke
  • Spend hours debugging

With tests:

  • Instant feedback
  • Fix before deploying
  • Save hours of debugging

Coverage - How Much Is Tested?

Running Coverage

uv run pytest --cov=src/pif_compiler --cov-report=html

Sample Output

Name                           Stmts   Miss  Cover
--------------------------------------------------
cosing_service.py                 89      5    94%
echa_service.py                  156     89    43%
models.py                         45     45     0%
--------------------------------------------------
TOTAL                            290    139    52%

Interpretation

  • cosing_service.py - 94% covered (great!)
  • ⚠️ echa_service.py - 43% covered (needs more tests)
  • models.py - 0% covered (no tests yet)

Coverage Goals

Coverage Status Action
90-100% Excellent Maintain
70-90% ⚠️ Good Add edge cases
50-70% ⚠️ Acceptable Prioritize critical paths
<50% Poor Add tests immediately

Target: 80%+ for business-critical code

HTML Coverage Report

uv run pytest --cov=src/pif_compiler --cov-report=html
# Open htmlcov/index.html in browser

Shows:

  • Which lines are tested (green)
  • Which lines are not tested (red)
  • Which branches are not covered

Best Practices Summary

DO:

  1. Write tests for all business logic

    # YES: Test calculations
    def test_sed_calculation():
        ingredient = Ingredient(quantity=10.0, dap=0.5)
        assert ingredient.calculate_sed() == 5.0
    
  2. Mock external dependencies

    # YES: Mock API calls
    @patch('cosing_service.req.post')
    def test_search(mock_post):
        mock_post.return_value.json.return_value = {...}
    
  3. Test edge cases

    # YES: Test edge cases
    def test_parse_empty_cas():
        assert parse_cas_numbers([""]) == []
    
    def test_parse_invalid_cas():
        with pytest.raises(ValueError):
            parse_cas_numbers(["abc-def-ghi"])
    
  4. Keep tests simple

    # YES: One test = one thing
    def test_cas_removes_whitespace():
        assert parse_cas_numbers(["  123-45-6  "]) == ["123-45-6"]
    
    # NO: Testing multiple things
    def test_cas_everything():
        assert parse_cas_numbers(["  123-45-6  "]) == ["123-45-6"]
        assert parse_cas_numbers(["123-45-6/789-01-2"]) == [...]
        # Too much in one test!
    
  5. Run tests before committing

    git add .
    uv run pytest  # ← Always run first!
    git commit -m "Add feature X"
    
  6. Use descriptive test names

    # YES: Describes what it tests
    def test_parse_cas_removes_parenthetical_info():
        ...
    
    # NO: Vague
    def test_cas_1():
        ...
    

DON'T:

  1. Don't test external libraries

    # NO: Testing if requests.post works
    def test_requests_library():
        response = requests.post("https://example.com")
        assert response.status_code == 200
    
    # YES: Test YOUR code that uses requests
    @patch('requests.post')
    def test_my_search_function(mock_post):
        ...
    
  2. Don't make tests dependent on each other

    # NO: test_b depends on test_a
    def test_a_creates_data():
        db.insert({"id": 1, "name": "test"})
    
    def test_b_uses_data():
        data = db.find_one({"id": 1})  # Breaks if test_a fails!
    
    # YES: Each test is independent
    def test_b_uses_data():
        db.insert({"id": 1, "name": "test"})  # Create own data
        data = db.find_one({"id": 1})
    
  3. Don't test implementation details

    # NO: Testing internal variable names
    def test_internal_state():
        obj = MyClass()
        assert obj._internal_var == "value"  # Breaks with refactoring
    
    # YES: Test public behavior
    def test_public_api():
        obj = MyClass()
        assert obj.get_value() == "value"
    
  4. Don't skip tests

    # NO: Commenting out failing tests
    # def test_broken_feature():
    #     assert broken_function() == "expected"
    
    # YES: Fix the test or mark as TODO
    @pytest.mark.skip(reason="Feature not implemented yet")
    def test_future_feature():
        ...
    

Practical Example: Your Workflow

Before (Manual Script)

# test_water.py
from cosing_service import cosing_search, clean_cosing

result = cosing_search("WATER", "name")
print(result)  # ← You manually check

clean = clean_cosing(result)
print(clean)  # ← You manually check again

# Run 10 times with different inputs... tedious!

Problems:

  • Manual verification
  • Slow (type command, read output, verify)
  • Error-prone (miss things)
  • Not repeatable

After (Automated Tests)

# tests/test_cosing_service.py
def test_search_and_clean_water():
    """Water should be searchable and cleanable."""
    result = cosing_search("WATER", "name")
    assert result is not None
    assert "inciName" in result

    clean = clean_cosing(result)
    assert clean["inciName"] == "WATER"
    assert "cosingUrl" in clean

# Run ONCE: pytest
# It checks everything automatically!

Run all 25 tests:

$ uv run pytest

tests/test_cosing_service.py::TestParseCasNumbers::test_single_cas_number PASSED
tests/test_cosing_service.py::TestParseCasNumbers::test_multiple_cas_with_slash PASSED
...
======================== 25 passed in 0.5s ========================

Benefits:

  • All pass? Safe to deploy!
  • One fails? Fix before deploying!
  • ⏱️ 25 tests in 0.5 seconds vs. manual testing for 30 minutes

When Should You Write Tests?

Always Test:

Business logic (calculations, data processing)

# YES
def test_calculate_sed():
    assert calculate_sed(quantity=10, dap=0.5) == 5.0

Data validation (Pydantic models)

# YES
def test_ingredient_validates_cas_format():
    with pytest.raises(ValidationError):
        Ingredient(cas="invalid", quantity=10.0)

API integrations (with mocks)

# YES
@patch('requests.post')
def test_cosing_search(mock_post):
    ...

Bug fixes (write test first, then fix)

# YES
def test_bug_123_empty_cas_crash():
    """Regression test for bug #123."""
    result = parse_cas_numbers([])  # Used to crash
    assert result == []

Sometimes Test:

⚠️ UI code (harder to test, less critical)

# Streamlit UI tests are complex, lower priority

⚠️ Configuration (usually simple)

# Config loading is straightforward, test if complex logic

Don't Test:

Third-party libraries (they have their own tests)

# NO: Testing if pandas works
def test_pandas_dataframe():
    df = pd.DataFrame({"a": [1, 2, 3]})
    assert len(df) == 3  # Pandas team already tested this!

Trivial code

# NO: Testing simple getters/setters
class MyClass:
    def get_name(self):
        return self.name  # Too simple to test

Your Next Steps

1. Install Pytest

cd c:\Users\adish\Projects\pif_compiler
uv add --dev pytest pytest-cov pytest-mock

2. Run the COSING Tests

# Run all tests
uv run pytest

# Run with verbose output
uv run pytest -v

# Run specific test file
uv run pytest tests/test_cosing_service.py

# Run specific test
uv run pytest tests/test_cosing_service.py::TestParseCasNumbers::test_single_cas_number

3. See Coverage

# Terminal report
uv run pytest --cov=src/pif_compiler/services/cosing_service

# HTML report (more detailed)
uv run pytest --cov=src/pif_compiler --cov-report=html
# Open htmlcov/index.html in browser

4. Start Writing Tests for New Code

Follow the TDD cycle:

  1. Red: Write failing test
  2. Green: Write minimal code to pass
  3. Refactor: Improve code
  4. Repeat!

Additional Resources

Pytest Documentation

Testing Philosophy

PIF Compiler Specific


Summary

Testing transforms your development workflow:

Without Tests With Tests
Manual verification Automated checks
Slow feedback Instant feedback
Fear of breaking things Confidence to refactor
Undocumented behavior Tests as documentation
Debug for hours Pinpoint issues immediately

Start small:

  1. Write tests for one service ( COSING done!)
  2. Add tests for new features
  3. Fix bugs with tests first
  4. Gradually increase coverage

The investment pays off:

  • Fewer bugs in production
  • Faster development (less debugging)
  • Better code design
  • Easier collaboration
  • Peace of mind 😌

Last updated: 2025-01-04