18 KiB
Testing Guide - Theory and Best Practices
Table of Contents
- Introduction
- Your Current Approach vs. Test-Driven Development
- The Testing Pyramid
- Key Concepts
- Real-World Testing Workflow
- Regression Testing
- Code Coverage
- Best Practices
- Practical Examples
- When Should You Write Tests
- Getting Started
Introduction
This guide explains the theory and best practices of software testing, specifically for the PIF Compiler project. It moves beyond ad-hoc testing scripts to a comprehensive, automated testing approach.
Your Current Approach vs. Test-Driven Development
What You Do Now (Ad-hoc Scripts):
# test_script.py
from cosing_service import cosing_search
result = cosing_search("WATER", mode="name")
print(result) # Look at output, check if it looks right
Problems:
- ❌ Manual checking (is the output correct?)
- ❌ Not repeatable (you forget what "correct" looks like)
- ❌ Doesn't catch regressions (future changes break old code)
- ❌ No documentation (what should the function do?)
- ❌ Tedious for many functions
The Testing Pyramid
/\
/ \ E2E Tests (Few)
/----\
/ \ Integration Tests (Some)
/--------\
/ \ Unit Tests (Many)
/____________\
1. Unit Tests (Bottom - Most Important)
Test individual functions in isolation.
Example:
def test_parse_cas_numbers_single():
"""Test parsing a single CAS number."""
result = parse_cas_numbers(["7732-18-5"])
assert result == ["7732-18-5"] # ← Automated check
Benefits:
- ✅ Fast (milliseconds)
- ✅ No external dependencies (no API, no database)
- ✅ Pinpoint exact problem
- ✅ Run hundreds in seconds
When to use:
- Testing individual functions
- Testing data parsing/validation
- Testing business logic calculations
2. Integration Tests (Middle)
Test multiple components working together.
Example:
def test_full_cosing_workflow():
"""Test search + clean workflow."""
raw = cosing_search("WATER", mode="name")
clean = clean_cosing(raw)
assert "cosingUrl" in clean
Benefits:
- ✅ Tests real interactions
- ✅ Catches integration bugs
Drawbacks:
- ⚠️ Slower (hits real APIs)
- ⚠️ Requires internet/database
When to use:
- Testing workflows across multiple services
- Testing API integrations
- Testing database interactions
3. E2E Tests (End-to-End - Top - Fewest)
Test entire application flow (UI → Backend → Database).
Example:
def test_create_pif_from_ui():
"""User creates PIF through Streamlit UI."""
# Click buttons, fill forms, verify PDF generated
When to use:
- Testing complete user workflows
- Smoke tests before deployment
- Critical business processes
Key Concepts
1. Assertions - Automated Verification
Old way (manual):
result = parse_cas_numbers(["7732-18-5/56-81-5"])
print(result) # You look at: ['7732-18-5', '56-81-5']
# Is this right? Maybe? You forget in 2 weeks.
Test way (automated):
def test_parse_multiple_cas():
result = parse_cas_numbers(["7732-18-5/56-81-5"])
assert result == ["7732-18-5", "56-81-5"] # ← Computer checks!
# If wrong, test FAILS immediately
Common Assertions:
# Equality
assert result == expected
# Truthiness
assert result is not None
assert "key" in result
# Exceptions
with pytest.raises(ValueError):
invalid_function()
# Approximate equality (for floats)
assert result == pytest.approx(3.14159, rel=1e-5)
2. Mocking - Control External Dependencies
Problem: Testing cosing_search() hits the real COSING API:
- ⚠️ Slow (network request)
- ⚠️ Unreliable (API might be down)
- ⚠️ Expensive (rate limits)
- ⚠️ Hard to test errors (how do you make API return error?)
Solution: Mock it!
from unittest.mock import Mock, patch
@patch('cosing_service.req.post') # Replace real HTTP request
def test_search_by_name(mock_post):
# Control what the "API" returns
mock_response = Mock()
mock_response.json.return_value = {
"results": [{"metadata": {"inciName": ["WATER"]}}]
}
mock_post.return_value = mock_response
result = cosing_search("WATER", mode="name")
assert result["inciName"] == ["WATER"] # ← Test your logic, not the API
mock_post.assert_called_once() # Verify it was called
Benefits:
- ✅ Fast (no real network)
- ✅ Reliable (always works)
- ✅ Can test error cases (mock API failures)
- ✅ Isolate your code from external issues
What to mock:
- HTTP requests (
requests.get,requests.post) - Database calls (
db.find_one,db.insert) - File I/O (
open,read,write) - External APIs (COSING, ECHA, PubChem)
- Time-dependent functions (
datetime.now())
3. Fixtures - Reusable Test Data
Without fixtures (repetitive):
def test_clean_basic():
data = {"inciName": ["WATER"], "casNo": ["7732-18-5"], ...}
result = clean_cosing(data)
assert ...
def test_clean_empty():
data = {"inciName": ["WATER"], "casNo": ["7732-18-5"], ...} # Copy-paste!
result = clean_cosing(data)
assert ...
With fixtures (DRY - Don't Repeat Yourself):
# conftest.py
@pytest.fixture
def sample_cosing_response():
"""Reusable COSING response data."""
return {
"inciName": ["WATER"],
"casNo": ["7732-18-5"],
"substanceId": ["12345"]
}
# test file
def test_clean_basic(sample_cosing_response): # Auto-injected!
result = clean_cosing(sample_cosing_response)
assert result["inciName"] == "WATER"
def test_clean_empty(sample_cosing_response): # Reuse same data!
result = clean_cosing(sample_cosing_response)
assert "cosingUrl" in result
Benefits:
- ✅ No code duplication
- ✅ Centralized test data
- ✅ Easy to update (change once, affects all tests)
- ✅ Auto-cleanup (fixtures can tear down resources)
Common fixture patterns:
# Database fixture with cleanup
@pytest.fixture
def test_db():
db = connect_to_test_db()
yield db # Test runs here
db.drop_all() # Cleanup after test
# Temporary file fixture
@pytest.fixture
def temp_file(tmp_path):
file_path = tmp_path / "test.json"
file_path.write_text('{"test": "data"}')
return file_path # Auto-cleaned by pytest
Real-World Testing Workflow
Scenario: You Add a New Feature
Step 1: Write the test FIRST (TDD - Test-Driven Development):
def test_parse_cas_removes_parentheses():
"""CAS numbers with parentheses should be cleaned."""
result = parse_cas_numbers(["7732-18-5 (hydrate)"])
assert result == ["7732-18-5"]
Step 2: Run test - it FAILS (expected!):
$ uv run pytest tests/test_cosing_service.py::test_parse_cas_removes_parentheses
FAILED: AssertionError: assert ['7732-18-5 (hydrate)'] == ['7732-18-5']
Step 3: Write code to make it pass:
def parse_cas_numbers(cas_string: list) -> list:
cas_string = cas_string[0]
cas_string = re.sub(r"\([^)]*\)", "", cas_string) # ← Add this
# ... rest of function
Step 4: Run test again - it PASSES:
$ uv run pytest tests/test_cosing_service.py::test_parse_cas_removes_parentheses
PASSED ✓
Step 5: Refactor if needed - tests ensure you don't break anything!
TDD Cycle (Red-Green-Refactor)
1. RED: Write failing test
↓
2. GREEN: Write minimal code to pass
↓
3. REFACTOR: Improve code without breaking tests
↓
Repeat
Benefits:
- ✅ Forces you to think about requirements first
- ✅ Prevents over-engineering
- ✅ Built-in documentation (tests show intended behavior)
- ✅ Confidence to refactor
Regression Testing - The Killer Feature
Scenario: You change code 6 months later:
# Original (working)
def parse_cas_numbers(cas_string: list) -> list:
cas_string = cas_string[0]
cas_string = re.sub(r"\([^)]*\)", "", cas_string)
cas_parts = re.split(r"[/;,]", cas_string) # Handles /, ;, ,
return [cas.strip() for cas in cas_parts]
# You "improve" it
def parse_cas_numbers(cas_string: list) -> list:
return cas_string[0].split("/") # Simpler! But...
Run tests:
$ uv run pytest
FAILED: test_multiple_cas_with_semicolon
Expected: ['7732-18-5', '56-81-5']
Got: ['7732-18-5;56-81-5'] # ← Oops, broke semicolon support!
FAILED: test_cas_with_parentheses
Expected: ['7732-18-5']
Got: ['7732-18-5 (hydrate)'] # ← Broke parentheses removal!
Without tests:
- You deploy
- Users report bugs
- You're confused what broke
- Spend hours debugging
With tests:
- Instant feedback
- Fix before deploying
- Save hours of debugging
Coverage - How Much Is Tested?
Running Coverage
uv run pytest --cov=src/pif_compiler --cov-report=html
Sample Output
Name Stmts Miss Cover
--------------------------------------------------
cosing_service.py 89 5 94%
echa_service.py 156 89 43%
models.py 45 45 0%
--------------------------------------------------
TOTAL 290 139 52%
Interpretation
- ✅
cosing_service.py- 94% covered (great!) - ⚠️
echa_service.py- 43% covered (needs more tests) - ❌
models.py- 0% covered (no tests yet)
Coverage Goals
| Coverage | Status | Action |
|---|---|---|
| 90-100% | ✅ Excellent | Maintain |
| 70-90% | ⚠️ Good | Add edge cases |
| 50-70% | ⚠️ Acceptable | Prioritize critical paths |
| <50% | ❌ Poor | Add tests immediately |
Target: 80%+ for business-critical code
HTML Coverage Report
uv run pytest --cov=src/pif_compiler --cov-report=html
# Open htmlcov/index.html in browser
Shows:
- Which lines are tested (green)
- Which lines are not tested (red)
- Which branches are not covered
Best Practices Summary
✅ DO:
-
Write tests for all business logic
# YES: Test calculations def test_sed_calculation(): ingredient = Ingredient(quantity=10.0, dap=0.5) assert ingredient.calculate_sed() == 5.0 -
Mock external dependencies
# YES: Mock API calls @patch('cosing_service.req.post') def test_search(mock_post): mock_post.return_value.json.return_value = {...} -
Test edge cases
# YES: Test edge cases def test_parse_empty_cas(): assert parse_cas_numbers([""]) == [] def test_parse_invalid_cas(): with pytest.raises(ValueError): parse_cas_numbers(["abc-def-ghi"]) -
Keep tests simple
# YES: One test = one thing def test_cas_removes_whitespace(): assert parse_cas_numbers([" 123-45-6 "]) == ["123-45-6"] # NO: Testing multiple things def test_cas_everything(): assert parse_cas_numbers([" 123-45-6 "]) == ["123-45-6"] assert parse_cas_numbers(["123-45-6/789-01-2"]) == [...] # Too much in one test! -
Run tests before committing
git add . uv run pytest # ← Always run first! git commit -m "Add feature X" -
Use descriptive test names
# YES: Describes what it tests def test_parse_cas_removes_parenthetical_info(): ... # NO: Vague def test_cas_1(): ...
❌ DON'T:
-
Don't test external libraries
# NO: Testing if requests.post works def test_requests_library(): response = requests.post("https://example.com") assert response.status_code == 200 # YES: Test YOUR code that uses requests @patch('requests.post') def test_my_search_function(mock_post): ... -
Don't make tests dependent on each other
# NO: test_b depends on test_a def test_a_creates_data(): db.insert({"id": 1, "name": "test"}) def test_b_uses_data(): data = db.find_one({"id": 1}) # Breaks if test_a fails! # YES: Each test is independent def test_b_uses_data(): db.insert({"id": 1, "name": "test"}) # Create own data data = db.find_one({"id": 1}) -
Don't test implementation details
# NO: Testing internal variable names def test_internal_state(): obj = MyClass() assert obj._internal_var == "value" # Breaks with refactoring # YES: Test public behavior def test_public_api(): obj = MyClass() assert obj.get_value() == "value" -
Don't skip tests
# NO: Commenting out failing tests # def test_broken_feature(): # assert broken_function() == "expected" # YES: Fix the test or mark as TODO @pytest.mark.skip(reason="Feature not implemented yet") def test_future_feature(): ...
Practical Example: Your Workflow
Before (Manual Script)
# test_water.py
from cosing_service import cosing_search, clean_cosing
result = cosing_search("WATER", "name")
print(result) # ← You manually check
clean = clean_cosing(result)
print(clean) # ← You manually check again
# Run 10 times with different inputs... tedious!
Problems:
- Manual verification
- Slow (type command, read output, verify)
- Error-prone (miss things)
- Not repeatable
After (Automated Tests)
# tests/test_cosing_service.py
def test_search_and_clean_water():
"""Water should be searchable and cleanable."""
result = cosing_search("WATER", "name")
assert result is not None
assert "inciName" in result
clean = clean_cosing(result)
assert clean["inciName"] == "WATER"
assert "cosingUrl" in clean
# Run ONCE: pytest
# It checks everything automatically!
Run all 25 tests:
$ uv run pytest
tests/test_cosing_service.py::TestParseCasNumbers::test_single_cas_number PASSED
tests/test_cosing_service.py::TestParseCasNumbers::test_multiple_cas_with_slash PASSED
...
======================== 25 passed in 0.5s ========================
Benefits:
- ✅ All pass? Safe to deploy!
- ❌ One fails? Fix before deploying!
- ⏱️ 25 tests in 0.5 seconds vs. manual testing for 30 minutes
When Should You Write Tests?
Always Test:
✅ Business logic (calculations, data processing)
# YES
def test_calculate_sed():
assert calculate_sed(quantity=10, dap=0.5) == 5.0
✅ Data validation (Pydantic models)
# YES
def test_ingredient_validates_cas_format():
with pytest.raises(ValidationError):
Ingredient(cas="invalid", quantity=10.0)
✅ API integrations (with mocks)
# YES
@patch('requests.post')
def test_cosing_search(mock_post):
...
✅ Bug fixes (write test first, then fix)
# YES
def test_bug_123_empty_cas_crash():
"""Regression test for bug #123."""
result = parse_cas_numbers([]) # Used to crash
assert result == []
Sometimes Test:
⚠️ UI code (harder to test, less critical)
# Streamlit UI tests are complex, lower priority
⚠️ Configuration (usually simple)
# Config loading is straightforward, test if complex logic
Don't Test:
❌ Third-party libraries (they have their own tests)
# NO: Testing if pandas works
def test_pandas_dataframe():
df = pd.DataFrame({"a": [1, 2, 3]})
assert len(df) == 3 # Pandas team already tested this!
❌ Trivial code
# NO: Testing simple getters/setters
class MyClass:
def get_name(self):
return self.name # Too simple to test
Your Next Steps
1. Install Pytest
cd c:\Users\adish\Projects\pif_compiler
uv add --dev pytest pytest-cov pytest-mock
2. Run the COSING Tests
# Run all tests
uv run pytest
# Run with verbose output
uv run pytest -v
# Run specific test file
uv run pytest tests/test_cosing_service.py
# Run specific test
uv run pytest tests/test_cosing_service.py::TestParseCasNumbers::test_single_cas_number
3. See Coverage
# Terminal report
uv run pytest --cov=src/pif_compiler/services/cosing_service
# HTML report (more detailed)
uv run pytest --cov=src/pif_compiler --cov-report=html
# Open htmlcov/index.html in browser
4. Start Writing Tests for New Code
Follow the TDD cycle:
- Red: Write failing test
- Green: Write minimal code to pass
- Refactor: Improve code
- Repeat!
Additional Resources
Pytest Documentation
Testing Philosophy
PIF Compiler Specific
- tests/README.md - Test suite documentation
- tests/RUN_TESTS.md - Quick start guide
- REFACTORING.md - Code organization changes
Summary
Testing transforms your development workflow:
| Without Tests | With Tests |
|---|---|
| Manual verification | Automated checks |
| Slow feedback | Instant feedback |
| Fear of breaking things | Confidence to refactor |
| Undocumented behavior | Tests as documentation |
| Debug for hours | Pinpoint issues immediately |
Start small:
- Write tests for one service (✅ COSING done!)
- Add tests for new features
- Fix bugs with tests first
- Gradually increase coverage
The investment pays off:
- Fewer bugs in production
- Faster development (less debugging)
- Better code design
- Easier collaboration
- Peace of mind 😌
Last updated: 2025-01-04