cosmoguard-bd/docs/testing_guide.md

767 lines
18 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Testing Guide - Theory and Best Practices
## Table of Contents
- [Introduction](#introduction)
- [Your Current Approach vs. Test-Driven Development](#your-current-approach-vs-test-driven-development)
- [The Testing Pyramid](#the-testing-pyramid)
- [Key Concepts](#key-concepts)
- [Real-World Testing Workflow](#real-world-testing-workflow)
- [Regression Testing](#regression-testing---the-killer-feature)
- [Code Coverage](#coverage---how-much-is-tested)
- [Best Practices](#best-practices-summary)
- [Practical Examples](#practical-example-your-workflow)
- [When Should You Write Tests](#when-should-you-write-tests)
- [Getting Started](#your-next-steps)
---
## Introduction
This guide explains the theory and best practices of software testing, specifically for the PIF Compiler project. It moves beyond ad-hoc testing scripts to a comprehensive, automated testing approach.
---
## Your Current Approach vs. Test-Driven Development
### What You Do Now (Ad-hoc Scripts):
```python
# test_script.py
from cosing_service import cosing_search
result = cosing_search("WATER", mode="name")
print(result) # Look at output, check if it looks right
```
**Problems:**
- ❌ Manual checking (is the output correct?)
- ❌ Not repeatable (you forget what "correct" looks like)
- ❌ Doesn't catch regressions (future changes break old code)
- ❌ No documentation (what should the function do?)
- ❌ Tedious for many functions
---
## The Testing Pyramid
```
/\
/ \ E2E Tests (Few)
/----\
/ \ Integration Tests (Some)
/--------\
/ \ Unit Tests (Many)
/____________\
```
### 1. **Unit Tests** (Bottom - Most Important)
Test individual functions in isolation.
**Example:**
```python
def test_parse_cas_numbers_single():
"""Test parsing a single CAS number."""
result = parse_cas_numbers(["7732-18-5"])
assert result == ["7732-18-5"] # ← Automated check
```
**Benefits:**
- ✅ Fast (milliseconds)
- ✅ No external dependencies (no API, no database)
- ✅ Pinpoint exact problem
- ✅ Run hundreds in seconds
**When to use:**
- Testing individual functions
- Testing data parsing/validation
- Testing business logic calculations
---
### 2. **Integration Tests** (Middle)
Test multiple components working together.
**Example:**
```python
def test_full_cosing_workflow():
"""Test search + clean workflow."""
raw = cosing_search("WATER", mode="name")
clean = clean_cosing(raw)
assert "cosingUrl" in clean
```
**Benefits:**
- ✅ Tests real interactions
- ✅ Catches integration bugs
**Drawbacks:**
- ⚠️ Slower (hits real APIs)
- ⚠️ Requires internet/database
**When to use:**
- Testing workflows across multiple services
- Testing API integrations
- Testing database interactions
---
### 3. **E2E Tests** (End-to-End - Top - Fewest)
Test entire application flow (UI → Backend → Database).
**Example:**
```python
def test_create_pif_from_ui():
"""User creates PIF through Streamlit UI."""
# Click buttons, fill forms, verify PDF generated
```
**When to use:**
- Testing complete user workflows
- Smoke tests before deployment
- Critical business processes
---
## Key Concepts
### 1. **Assertions - Automated Verification**
**Old way (manual):**
```python
result = parse_cas_numbers(["7732-18-5/56-81-5"])
print(result) # You look at: ['7732-18-5', '56-81-5']
# Is this right? Maybe? You forget in 2 weeks.
```
**Test way (automated):**
```python
def test_parse_multiple_cas():
result = parse_cas_numbers(["7732-18-5/56-81-5"])
assert result == ["7732-18-5", "56-81-5"] # ← Computer checks!
# If wrong, test FAILS immediately
```
**Common Assertions:**
```python
# Equality
assert result == expected
# Truthiness
assert result is not None
assert "key" in result
# Exceptions
with pytest.raises(ValueError):
invalid_function()
# Approximate equality (for floats)
assert result == pytest.approx(3.14159, rel=1e-5)
```
---
### 2. **Mocking - Control External Dependencies**
**Problem:** Testing `cosing_search()` hits the real COSING API:
- ⚠️ Slow (network request)
- ⚠️ Unreliable (API might be down)
- ⚠️ Expensive (rate limits)
- ⚠️ Hard to test errors (how do you make API return error?)
**Solution: Mock it!**
```python
from unittest.mock import Mock, patch
@patch('cosing_service.req.post') # Replace real HTTP request
def test_search_by_name(mock_post):
# Control what the "API" returns
mock_response = Mock()
mock_response.json.return_value = {
"results": [{"metadata": {"inciName": ["WATER"]}}]
}
mock_post.return_value = mock_response
result = cosing_search("WATER", mode="name")
assert result["inciName"] == ["WATER"] # ← Test your logic, not the API
mock_post.assert_called_once() # Verify it was called
```
**Benefits:**
- ✅ Fast (no real network)
- ✅ Reliable (always works)
- ✅ Can test error cases (mock API failures)
- ✅ Isolate your code from external issues
**What to mock:**
- HTTP requests (`requests.get`, `requests.post`)
- Database calls (`db.find_one`, `db.insert`)
- File I/O (`open`, `read`, `write`)
- External APIs (COSING, ECHA, PubChem)
- Time-dependent functions (`datetime.now()`)
---
### 3. **Fixtures - Reusable Test Data**
**Without fixtures (repetitive):**
```python
def test_clean_basic():
data = {"inciName": ["WATER"], "casNo": ["7732-18-5"], ...}
result = clean_cosing(data)
assert ...
def test_clean_empty():
data = {"inciName": ["WATER"], "casNo": ["7732-18-5"], ...} # Copy-paste!
result = clean_cosing(data)
assert ...
```
**With fixtures (DRY - Don't Repeat Yourself):**
```python
# conftest.py
@pytest.fixture
def sample_cosing_response():
"""Reusable COSING response data."""
return {
"inciName": ["WATER"],
"casNo": ["7732-18-5"],
"substanceId": ["12345"]
}
# test file
def test_clean_basic(sample_cosing_response): # Auto-injected!
result = clean_cosing(sample_cosing_response)
assert result["inciName"] == "WATER"
def test_clean_empty(sample_cosing_response): # Reuse same data!
result = clean_cosing(sample_cosing_response)
assert "cosingUrl" in result
```
**Benefits:**
- ✅ No code duplication
- ✅ Centralized test data
- ✅ Easy to update (change once, affects all tests)
- ✅ Auto-cleanup (fixtures can tear down resources)
**Common fixture patterns:**
```python
# Database fixture with cleanup
@pytest.fixture
def test_db():
db = connect_to_test_db()
yield db # Test runs here
db.drop_all() # Cleanup after test
# Temporary file fixture
@pytest.fixture
def temp_file(tmp_path):
file_path = tmp_path / "test.json"
file_path.write_text('{"test": "data"}')
return file_path # Auto-cleaned by pytest
```
---
## Real-World Testing Workflow
### Scenario: You Add a New Feature
**Step 1: Write the test FIRST (TDD - Test-Driven Development):**
```python
def test_parse_cas_removes_parentheses():
"""CAS numbers with parentheses should be cleaned."""
result = parse_cas_numbers(["7732-18-5 (hydrate)"])
assert result == ["7732-18-5"]
```
**Step 2: Run test - it FAILS (expected!):**
```bash
$ uv run pytest tests/test_cosing_service.py::test_parse_cas_removes_parentheses
FAILED: AssertionError: assert ['7732-18-5 (hydrate)'] == ['7732-18-5']
```
**Step 3: Write code to make it pass:**
```python
def parse_cas_numbers(cas_string: list) -> list:
cas_string = cas_string[0]
cas_string = re.sub(r"\([^)]*\)", "", cas_string) # ← Add this
# ... rest of function
```
**Step 4: Run test again - it PASSES:**
```bash
$ uv run pytest tests/test_cosing_service.py::test_parse_cas_removes_parentheses
PASSED ✓
```
**Step 5: Refactor if needed - tests ensure you don't break anything!**
---
### TDD Cycle (Red-Green-Refactor)
```
1. RED: Write failing test
2. GREEN: Write minimal code to pass
3. REFACTOR: Improve code without breaking tests
Repeat
```
**Benefits:**
- ✅ Forces you to think about requirements first
- ✅ Prevents over-engineering
- ✅ Built-in documentation (tests show intended behavior)
- ✅ Confidence to refactor
---
## Regression Testing - The Killer Feature
**Scenario: You change code 6 months later:**
```python
# Original (working)
def parse_cas_numbers(cas_string: list) -> list:
cas_string = cas_string[0]
cas_string = re.sub(r"\([^)]*\)", "", cas_string)
cas_parts = re.split(r"[/;,]", cas_string) # Handles /, ;, ,
return [cas.strip() for cas in cas_parts]
# You "improve" it
def parse_cas_numbers(cas_string: list) -> list:
return cas_string[0].split("/") # Simpler! But...
```
**Run tests:**
```bash
$ uv run pytest
FAILED: test_multiple_cas_with_semicolon
Expected: ['7732-18-5', '56-81-5']
Got: ['7732-18-5;56-81-5'] # ← Oops, broke semicolon support!
FAILED: test_cas_with_parentheses
Expected: ['7732-18-5']
Got: ['7732-18-5 (hydrate)'] # ← Broke parentheses removal!
```
**Without tests:**
- You deploy
- Users report bugs
- You're confused what broke
- Spend hours debugging
**With tests:**
- Instant feedback
- Fix before deploying
- Save hours of debugging
---
## Coverage - How Much Is Tested?
### Running Coverage
```bash
uv run pytest --cov=src/pif_compiler --cov-report=html
```
### Sample Output
```
Name Stmts Miss Cover
--------------------------------------------------
cosing_service.py 89 5 94%
echa_service.py 156 89 43%
models.py 45 45 0%
--------------------------------------------------
TOTAL 290 139 52%
```
### Interpretation
-`cosing_service.py` - **94% covered** (great!)
- ⚠️ `echa_service.py` - **43% covered** (needs more tests)
-`models.py` - **0% covered** (no tests yet)
### Coverage Goals
| Coverage | Status | Action |
|----------|--------|--------|
| 90-100% | ✅ Excellent | Maintain |
| 70-90% | ⚠️ Good | Add edge cases |
| 50-70% | ⚠️ Acceptable | Prioritize critical paths |
| <50% | Poor | Add tests immediately |
**Target:** 80%+ for business-critical code
### HTML Coverage Report
```bash
uv run pytest --cov=src/pif_compiler --cov-report=html
# Open htmlcov/index.html in browser
```
Shows:
- Which lines are tested (green)
- Which lines are not tested (red)
- Which branches are not covered
---
## Best Practices Summary
### ✅ DO:
1. **Write tests for all business logic**
```python
# YES: Test calculations
def test_sed_calculation():
ingredient = Ingredient(quantity=10.0, dap=0.5)
assert ingredient.calculate_sed() == 5.0
```
2. **Mock external dependencies**
```python
# YES: Mock API calls
@patch('cosing_service.req.post')
def test_search(mock_post):
mock_post.return_value.json.return_value = {...}
```
3. **Test edge cases**
```python
# YES: Test edge cases
def test_parse_empty_cas():
assert parse_cas_numbers([""]) == []
def test_parse_invalid_cas():
with pytest.raises(ValueError):
parse_cas_numbers(["abc-def-ghi"])
```
4. **Keep tests simple**
```python
# YES: One test = one thing
def test_cas_removes_whitespace():
assert parse_cas_numbers([" 123-45-6 "]) == ["123-45-6"]
# NO: Testing multiple things
def test_cas_everything():
assert parse_cas_numbers([" 123-45-6 "]) == ["123-45-6"]
assert parse_cas_numbers(["123-45-6/789-01-2"]) == [...]
# Too much in one test!
```
5. **Run tests before committing**
```bash
git add .
uv run pytest # Always run first!
git commit -m "Add feature X"
```
6. **Use descriptive test names**
```python
# YES: Describes what it tests
def test_parse_cas_removes_parenthetical_info():
...
# NO: Vague
def test_cas_1():
...
```
---
### ❌ DON'T:
1. **Don't test external libraries**
```python
# NO: Testing if requests.post works
def test_requests_library():
response = requests.post("https://example.com")
assert response.status_code == 200
# YES: Test YOUR code that uses requests
@patch('requests.post')
def test_my_search_function(mock_post):
...
```
2. **Don't make tests dependent on each other**
```python
# NO: test_b depends on test_a
def test_a_creates_data():
db.insert({"id": 1, "name": "test"})
def test_b_uses_data():
data = db.find_one({"id": 1}) # Breaks if test_a fails!
# YES: Each test is independent
def test_b_uses_data():
db.insert({"id": 1, "name": "test"}) # Create own data
data = db.find_one({"id": 1})
```
3. **Don't test implementation details**
```python
# NO: Testing internal variable names
def test_internal_state():
obj = MyClass()
assert obj._internal_var == "value" # Breaks with refactoring
# YES: Test public behavior
def test_public_api():
obj = MyClass()
assert obj.get_value() == "value"
```
4. **Don't skip tests**
```python
# NO: Commenting out failing tests
# def test_broken_feature():
# assert broken_function() == "expected"
# YES: Fix the test or mark as TODO
@pytest.mark.skip(reason="Feature not implemented yet")
def test_future_feature():
...
```
---
## Practical Example: Your Workflow
### Before (Manual Script)
```python
# test_water.py
from cosing_service import cosing_search, clean_cosing
result = cosing_search("WATER", "name")
print(result) # ← You manually check
clean = clean_cosing(result)
print(clean) # ← You manually check again
# Run 10 times with different inputs... tedious!
```
**Problems:**
- Manual verification
- Slow (type command, read output, verify)
- Error-prone (miss things)
- Not repeatable
---
### After (Automated Tests)
```python
# tests/test_cosing_service.py
def test_search_and_clean_water():
"""Water should be searchable and cleanable."""
result = cosing_search("WATER", "name")
assert result is not None
assert "inciName" in result
clean = clean_cosing(result)
assert clean["inciName"] == "WATER"
assert "cosingUrl" in clean
# Run ONCE: pytest
# It checks everything automatically!
```
**Run all 25 tests:**
```bash
$ uv run pytest
tests/test_cosing_service.py::TestParseCasNumbers::test_single_cas_number PASSED
tests/test_cosing_service.py::TestParseCasNumbers::test_multiple_cas_with_slash PASSED
...
======================== 25 passed in 0.5s ========================
```
**Benefits:**
- All pass? Safe to deploy!
- One fails? Fix before deploying!
- 25 tests in 0.5 seconds vs. manual testing for 30 minutes
---
## When Should You Write Tests?
### Always Test:
**Business logic** (calculations, data processing)
```python
# YES
def test_calculate_sed():
assert calculate_sed(quantity=10, dap=0.5) == 5.0
```
**Data validation** (Pydantic models)
```python
# YES
def test_ingredient_validates_cas_format():
with pytest.raises(ValidationError):
Ingredient(cas="invalid", quantity=10.0)
```
**API integrations** (with mocks)
```python
# YES
@patch('requests.post')
def test_cosing_search(mock_post):
...
```
**Bug fixes** (write test first, then fix)
```python
# YES
def test_bug_123_empty_cas_crash():
"""Regression test for bug #123."""
result = parse_cas_numbers([]) # Used to crash
assert result == []
```
---
### Sometimes Test:
**UI code** (harder to test, less critical)
```python
# Streamlit UI tests are complex, lower priority
```
**Configuration** (usually simple)
```python
# Config loading is straightforward, test if complex logic
```
---
### Don't Test:
**Third-party libraries** (they have their own tests)
```python
# NO: Testing if pandas works
def test_pandas_dataframe():
df = pd.DataFrame({"a": [1, 2, 3]})
assert len(df) == 3 # Pandas team already tested this!
```
**Trivial code**
```python
# NO: Testing simple getters/setters
class MyClass:
def get_name(self):
return self.name # Too simple to test
```
---
## Your Next Steps
### 1. Install Pytest
```bash
cd c:\Users\adish\Projects\pif_compiler
uv add --dev pytest pytest-cov pytest-mock
```
### 2. Run the COSING Tests
```bash
# Run all tests
uv run pytest
# Run with verbose output
uv run pytest -v
# Run specific test file
uv run pytest tests/test_cosing_service.py
# Run specific test
uv run pytest tests/test_cosing_service.py::TestParseCasNumbers::test_single_cas_number
```
### 3. See Coverage
```bash
# Terminal report
uv run pytest --cov=src/pif_compiler/services/cosing_service
# HTML report (more detailed)
uv run pytest --cov=src/pif_compiler --cov-report=html
# Open htmlcov/index.html in browser
```
### 4. Start Writing Tests for New Code
Follow the TDD cycle:
1. **Red**: Write failing test
2. **Green**: Write minimal code to pass
3. **Refactor**: Improve code
4. Repeat!
---
## Additional Resources
### Pytest Documentation
- [Official Pytest Docs](https://docs.pytest.org/)
- [Pytest Fixtures](https://docs.pytest.org/en/stable/fixture.html)
- [Pytest Mocking](https://docs.pytest.org/en/stable/monkeypatch.html)
### Testing Philosophy
- [Test-Driven Development (TDD)](https://www.freecodecamp.org/news/test-driven-development-what-it-is-and-what-it-is-not-41fa6bca02a2/)
- [Testing Best Practices](https://testautomationuniversity.com/)
- [The Testing Pyramid](https://martinfowler.com/articles/practical-test-pyramid.html)
### PIF Compiler Specific
- [tests/README.md](../tests/README.md) - Test suite documentation
- [tests/RUN_TESTS.md](../tests/RUN_TESTS.md) - Quick start guide
- [REFACTORING.md](../REFACTORING.md) - Code organization changes
---
## Summary
**Testing transforms your development workflow:**
| Without Tests | With Tests |
|---------------|------------|
| Manual verification | Automated checks |
| Slow feedback | Instant feedback |
| Fear of breaking things | Confidence to refactor |
| Undocumented behavior | Tests as documentation |
| Debug for hours | Pinpoint issues immediately |
**Start small:**
1. Write tests for one service (✅ COSING done!)
2. Add tests for new features
3. Fix bugs with tests first
4. Gradually increase coverage
**The investment pays off:**
- Fewer bugs in production
- Faster development (less debugging)
- Better code design
- Easier collaboration
- Peace of mind 😌
---
*Last updated: 2025-01-04*