cosmoguard-bd/docs/testing_guide.md

# Testing Guide - Theory and Best Practices

## Table of Contents
- [Introduction](#introduction)
- [Your Current Approach vs. Test-Driven Development](#your-current-approach-vs-test-driven-development)
- [The Testing Pyramid](#the-testing-pyramid)
- [Key Concepts](#key-concepts)
- [Real-World Testing Workflow](#real-world-testing-workflow)
- [Regression Testing](#regression-testing---the-killer-feature)
- [Code Coverage](#coverage---how-much-is-tested)
- [Best Practices](#best-practices-summary)
- [Practical Examples](#practical-example-your-workflow)
- [When Should You Write Tests](#when-should-you-write-tests)
- [Getting Started](#your-next-steps)

---

## Introduction

This guide explains the theory and best practices of software testing, specifically for the PIF Compiler project. It moves beyond ad-hoc testing scripts to a comprehensive, automated testing approach.

---

## Your Current Approach vs. Test-Driven Development

### What You Do Now (Ad-hoc Scripts):

```python
# test_script.py
from cosing_service import cosing_search

result = cosing_search("WATER", mode="name")
print(result)  # Look at output, check if it looks right
```

**Problems:**
- ❌ Manual checking (is the output correct?)
- ❌ Not repeatable (you forget what "correct" looks like)
- ❌ Doesn't catch regressions (future changes break old code)
- ❌ No documentation (what should the function do?)
- ❌ Tedious for many functions

---

## The Testing Pyramid

```
        /\
       /  \  E2E Tests (Few)
      /----\
     /      \ Integration Tests (Some)
    /--------\
   /          \ Unit Tests (Many)
  /____________\
```

### 1. **Unit Tests** (Bottom - Most Important)

Test individual functions in isolation.

**Example:**
```python
def test_parse_cas_numbers_single():
    """Test parsing a single CAS number."""
    result = parse_cas_numbers(["7732-18-5"])
    assert result == ["7732-18-5"]  # ← Automated check
```

**Benefits:**
- ✅ Fast (milliseconds)
- ✅ No external dependencies (no API, no database)
- ✅ Pinpoint exact problem
- ✅ Run hundreds in seconds

**When to use:**
- Testing individual functions
- Testing data parsing/validation
- Testing business logic calculations

---

### 2. **Integration Tests** (Middle)

Test multiple components working together.

**Example:**
```python
def test_full_cosing_workflow():
    """Test search + clean workflow."""
    raw = cosing_search("WATER", mode="name")
    clean = clean_cosing(raw)
    assert "cosingUrl" in clean
```

**Benefits:**
- ✅ Tests real interactions
- ✅ Catches integration bugs

**Drawbacks:**
- ⚠️ Slower (hits real APIs)
- ⚠️ Requires internet/database

**When to use:**
- Testing workflows across multiple services
- Testing API integrations
- Testing database interactions

---

### 3. **E2E Tests** (End-to-End - Top - Fewest)

Test entire application flow (UI → Backend → Database).

**Example:**
```python
def test_create_pif_from_ui():
    """User creates PIF through Streamlit UI."""
    # Click buttons, fill forms, verify PDF generated
```

**When to use:**
- Testing complete user workflows
- Smoke tests before deployment
- Critical business processes

---

## Key Concepts

### 1. **Assertions - Automated Verification**

**Old way (manual):**
```python
result = parse_cas_numbers(["7732-18-5/56-81-5"])
print(result)  # You look at: ['7732-18-5', '56-81-5']
# Is this right? Maybe? You forget in 2 weeks.
```

**Test way (automated):**
```python
def test_parse_multiple_cas():
    result = parse_cas_numbers(["7732-18-5/56-81-5"])
    assert result == ["7732-18-5", "56-81-5"]  # ← Computer checks!
    # If wrong, test FAILS immediately
```

**Common Assertions:**
```python
# Equality
assert result == expected

# Truthiness
assert result is not None
assert "key" in result

# Exceptions
with pytest.raises(ValueError):
    invalid_function()

# Approximate equality (for floats)
assert result == pytest.approx(3.14159, rel=1e-5)
```

---

### 2. **Mocking - Control External Dependencies**

**Problem:** Testing `cosing_search()` hits the real COSING API:
- ⚠️ Slow (network request)
- ⚠️ Unreliable (API might be down)
- ⚠️ Expensive (rate limits)
- ⚠️ Hard to test errors (how do you make API return error?)

**Solution: Mock it!**
```python
from unittest.mock import Mock, patch

@patch('cosing_service.req.post')  # Replace real HTTP request
def test_search_by_name(mock_post):
    # Control what the "API" returns
    mock_response = Mock()
    mock_response.json.return_value = {
        "results": [{"metadata": {"inciName": ["WATER"]}}]
    }
    mock_post.return_value = mock_response

    result = cosing_search("WATER", mode="name")

    assert result["inciName"] == ["WATER"]  # ← Test your logic, not the API
    mock_post.assert_called_once()  # Verify it was called
```

**Benefits:**
- ✅ Fast (no real network)
- ✅ Reliable (always works)
- ✅ Can test error cases (mock API failures)
- ✅ Isolate your code from external issues

**What to mock:**
- HTTP requests (`requests.get`, `requests.post`)
- Database calls (`db.find_one`, `db.insert`)
- File I/O (`open`, `read`, `write`)
- External APIs (COSING, ECHA, PubChem)
- Time-dependent functions (`datetime.now()`)

---

### 3. **Fixtures - Reusable Test Data**

**Without fixtures (repetitive):**
```python
def test_clean_basic():
    data = {"inciName": ["WATER"], "casNo": ["7732-18-5"], ...}
    result = clean_cosing(data)
    assert ...

def test_clean_empty():
    data = {"inciName": ["WATER"], "casNo": ["7732-18-5"], ...}  # Copy-paste!
    result = clean_cosing(data)
    assert ...
```

**With fixtures (DRY - Don't Repeat Yourself):**
```python
# conftest.py
@pytest.fixture
def sample_cosing_response():
    """Reusable COSING response data."""
    return {
        "inciName": ["WATER"],
        "casNo": ["7732-18-5"],
        "substanceId": ["12345"]
    }

# test file
def test_clean_basic(sample_cosing_response):  # Auto-injected!
    result = clean_cosing(sample_cosing_response)
    assert result["inciName"] == "WATER"

def test_clean_empty(sample_cosing_response):  # Reuse same data!
    result = clean_cosing(sample_cosing_response)
    assert "cosingUrl" in result
```

**Benefits:**
- ✅ No code duplication
- ✅ Centralized test data
- ✅ Easy to update (change once, affects all tests)
- ✅ Auto-cleanup (fixtures can tear down resources)

**Common fixture patterns:**
```python
# Database fixture with cleanup
@pytest.fixture
def test_db():
    db = connect_to_test_db()
    yield db  # Test runs here
    db.drop_all()  # Cleanup after test

# Temporary file fixture
@pytest.fixture
def temp_file(tmp_path):
    file_path = tmp_path / "test.json"
    file_path.write_text('{"test": "data"}')
    return file_path  # Auto-cleaned by pytest
```

---

## Real-World Testing Workflow

### Scenario: You Add a New Feature

**Step 1: Write the test FIRST (TDD - Test-Driven Development):**
```python
def test_parse_cas_removes_parentheses():
    """CAS numbers with parentheses should be cleaned."""
    result = parse_cas_numbers(["7732-18-5 (hydrate)"])
    assert result == ["7732-18-5"]
```

**Step 2: Run test - it FAILS (expected!):**
```bash
$ uv run pytest tests/test_cosing_service.py::test_parse_cas_removes_parentheses

FAILED: AssertionError: assert ['7732-18-5 (hydrate)'] == ['7732-18-5']
```

**Step 3: Write code to make it pass:**
```python
def parse_cas_numbers(cas_string: list) -> list:
    cas_string = cas_string[0]
    cas_string = re.sub(r"\([^)]*\)", "", cas_string)  # ← Add this
    # ... rest of function
```

**Step 4: Run test again - it PASSES:**
```bash
$ uv run pytest tests/test_cosing_service.py::test_parse_cas_removes_parentheses

PASSED ✓
```

**Step 5: Refactor if needed - tests ensure you don't break anything!**

---

### TDD Cycle (Red-Green-Refactor)

```
1. RED:    Write failing test
     ↓
2. GREEN:  Write minimal code to pass
     ↓
3. REFACTOR: Improve code without breaking tests
     ↓
   Repeat
```

**Benefits:**
- ✅ Forces you to think about requirements first
- ✅ Prevents over-engineering
- ✅ Built-in documentation (tests show intended behavior)
- ✅ Confidence to refactor

---

## Regression Testing - The Killer Feature

**Scenario: You change code 6 months later:**

```python
# Original (working)
def parse_cas_numbers(cas_string: list) -> list:
    cas_string = cas_string[0]
    cas_string = re.sub(r"\([^)]*\)", "", cas_string)
    cas_parts = re.split(r"[/;,]", cas_string)  # Handles /, ;, ,
    return [cas.strip() for cas in cas_parts]

# You "improve" it
def parse_cas_numbers(cas_string: list) -> list:
    return cas_string[0].split("/")  # Simpler! But...
```

**Run tests:**
```bash
$ uv run pytest

FAILED: test_multiple_cas_with_semicolon
Expected: ['7732-18-5', '56-81-5']
Got: ['7732-18-5;56-81-5']  # ← Oops, broke semicolon support!

FAILED: test_cas_with_parentheses
Expected: ['7732-18-5']
Got: ['7732-18-5 (hydrate)']  # ← Broke parentheses removal!
```

**Without tests:**
- You deploy
- Users report bugs
- You're confused what broke
- Spend hours debugging

**With tests:**
- Instant feedback
- Fix before deploying
- Save hours of debugging

---

## Coverage - How Much Is Tested?

### Running Coverage

```bash
uv run pytest --cov=src/pif_compiler --cov-report=html
```

### Sample Output

```
Name                           Stmts   Miss  Cover
--------------------------------------------------
cosing_service.py                 89      5    94%
echa_service.py                  156     89    43%
models.py                         45     45     0%
--------------------------------------------------
TOTAL                            290    139    52%
```

### Interpretation

- ✅ `cosing_service.py` - **94% covered** (great!)
- ⚠️ `echa_service.py` - **43% covered** (needs more tests)
- ❌ `models.py` - **0% covered** (no tests yet)

### Coverage Goals

| Coverage | Status | Action |
|----------|--------|--------|
| 90-100% | ✅ Excellent | Maintain |
| 70-90% | ⚠️ Good | Add edge cases |
| 50-70% | ⚠️ Acceptable | Prioritize critical paths |
| <50% | ❌ Poor | Add tests immediately |

**Target:** 80%+ for business-critical code

### HTML Coverage Report

```bash
uv run pytest --cov=src/pif_compiler --cov-report=html
# Open htmlcov/index.html in browser
```

Shows:
- Which lines are tested (green)
- Which lines are not tested (red)
- Which branches are not covered

---

## Best Practices Summary

### ✅ DO:

1. **Write tests for all business logic**
   ```python
   # YES: Test calculations
   def test_sed_calculation():
       ingredient = Ingredient(quantity=10.0, dap=0.5)
       assert ingredient.calculate_sed() == 5.0
   ```

2. **Mock external dependencies**
   ```python
   # YES: Mock API calls
   @patch('cosing_service.req.post')
   def test_search(mock_post):
       mock_post.return_value.json.return_value = {...}
   ```

3. **Test edge cases**
   ```python
   # YES: Test edge cases
   def test_parse_empty_cas():
       assert parse_cas_numbers([""]) == []

   def test_parse_invalid_cas():
       with pytest.raises(ValueError):
           parse_cas_numbers(["abc-def-ghi"])
   ```

4. **Keep tests simple**
   ```python
   # YES: One test = one thing
   def test_cas_removes_whitespace():
       assert parse_cas_numbers(["  123-45-6  "]) == ["123-45-6"]

   # NO: Testing multiple things
   def test_cas_everything():
       assert parse_cas_numbers(["  123-45-6  "]) == ["123-45-6"]
       assert parse_cas_numbers(["123-45-6/789-01-2"]) == [...]
       # Too much in one test!
   ```

5. **Run tests before committing**
   ```bash
   git add .
   uv run pytest  # ← Always run first!
   git commit -m "Add feature X"
   ```

6. **Use descriptive test names**
   ```python
   # YES: Describes what it tests
   def test_parse_cas_removes_parenthetical_info():
       ...

   # NO: Vague
   def test_cas_1():
       ...
   ```

---

### ❌ DON'T:

1. **Don't test external libraries**
   ```python
   # NO: Testing if requests.post works
   def test_requests_library():
       response = requests.post("https://example.com")
       assert response.status_code == 200

   # YES: Test YOUR code that uses requests
   @patch('requests.post')
   def test_my_search_function(mock_post):
       ...
   ```

2. **Don't make tests dependent on each other**
   ```python
   # NO: test_b depends on test_a
   def test_a_creates_data():
       db.insert({"id": 1, "name": "test"})

   def test_b_uses_data():
       data = db.find_one({"id": 1})  # Breaks if test_a fails!

   # YES: Each test is independent
   def test_b_uses_data():
       db.insert({"id": 1, "name": "test"})  # Create own data
       data = db.find_one({"id": 1})
   ```

3. **Don't test implementation details**
   ```python
   # NO: Testing internal variable names
   def test_internal_state():
       obj = MyClass()
       assert obj._internal_var == "value"  # Breaks with refactoring

   # YES: Test public behavior
   def test_public_api():
       obj = MyClass()
       assert obj.get_value() == "value"
   ```

4. **Don't skip tests**
   ```python
   # NO: Commenting out failing tests
   # def test_broken_feature():
   #     assert broken_function() == "expected"

   # YES: Fix the test or mark as TODO
   @pytest.mark.skip(reason="Feature not implemented yet")
   def test_future_feature():
       ...
   ```

---

## Practical Example: Your Workflow

### Before (Manual Script)

```python
# test_water.py
from cosing_service import cosing_search, clean_cosing

result = cosing_search("WATER", "name")
print(result)  # ← You manually check

clean = clean_cosing(result)
print(clean)  # ← You manually check again

# Run 10 times with different inputs... tedious!
```

**Problems:**
- Manual verification
- Slow (type command, read output, verify)
- Error-prone (miss things)
- Not repeatable

---

### After (Automated Tests)

```python
# tests/test_cosing_service.py
def test_search_and_clean_water():
    """Water should be searchable and cleanable."""
    result = cosing_search("WATER", "name")
    assert result is not None
    assert "inciName" in result

    clean = clean_cosing(result)
    assert clean["inciName"] == "WATER"
    assert "cosingUrl" in clean

# Run ONCE: pytest
# It checks everything automatically!
```

**Run all 25 tests:**
```bash
$ uv run pytest

tests/test_cosing_service.py::TestParseCasNumbers::test_single_cas_number PASSED
tests/test_cosing_service.py::TestParseCasNumbers::test_multiple_cas_with_slash PASSED
...
======================== 25 passed in 0.5s ========================
```

**Benefits:**
- ✅ All pass? Safe to deploy!
- ❌ One fails? Fix before deploying!
- ⏱️ 25 tests in 0.5 seconds vs. manual testing for 30 minutes

---

## When Should You Write Tests?

### Always Test:

✅ **Business logic** (calculations, data processing)
```python
# YES
def test_calculate_sed():
    assert calculate_sed(quantity=10, dap=0.5) == 5.0
```

✅ **Data validation** (Pydantic models)
```python
# YES
def test_ingredient_validates_cas_format():
    with pytest.raises(ValidationError):
        Ingredient(cas="invalid", quantity=10.0)
```

✅ **API integrations** (with mocks)
```python
# YES
@patch('requests.post')
def test_cosing_search(mock_post):
    ...
```

✅ **Bug fixes** (write test first, then fix)
```python
# YES
def test_bug_123_empty_cas_crash():
    """Regression test for bug #123."""
    result = parse_cas_numbers([])  # Used to crash
    assert result == []
```

---

### Sometimes Test:

⚠️ **UI code** (harder to test, less critical)
```python
# Streamlit UI tests are complex, lower priority
```

⚠️ **Configuration** (usually simple)
```python
# Config loading is straightforward, test if complex logic
```

---

### Don't Test:

❌ **Third-party libraries** (they have their own tests)
```python
# NO: Testing if pandas works
def test_pandas_dataframe():
    df = pd.DataFrame({"a": [1, 2, 3]})
    assert len(df) == 3  # Pandas team already tested this!
```

❌ **Trivial code**
```python
# NO: Testing simple getters/setters
class MyClass:
    def get_name(self):
        return self.name  # Too simple to test
```

---

## Your Next Steps

### 1. Install Pytest

```bash
cd c:\Users\adish\Projects\pif_compiler
uv add --dev pytest pytest-cov pytest-mock
```

### 2. Run the COSING Tests

```bash
# Run all tests
uv run pytest

# Run with verbose output
uv run pytest -v

# Run specific test file
uv run pytest tests/test_cosing_service.py

# Run specific test
uv run pytest tests/test_cosing_service.py::TestParseCasNumbers::test_single_cas_number
```

### 3. See Coverage

```bash
# Terminal report
uv run pytest --cov=src/pif_compiler/services/cosing_service

# HTML report (more detailed)
uv run pytest --cov=src/pif_compiler --cov-report=html
# Open htmlcov/index.html in browser
```

### 4. Start Writing Tests for New Code

Follow the TDD cycle:
1. **Red**: Write failing test
2. **Green**: Write minimal code to pass
3. **Refactor**: Improve code
4. Repeat!

---

## Additional Resources

### Pytest Documentation
- [Official Pytest Docs](https://docs.pytest.org/)
- [Pytest Fixtures](https://docs.pytest.org/en/stable/fixture.html)
- [Pytest Mocking](https://docs.pytest.org/en/stable/monkeypatch.html)

### Testing Philosophy
- [Test-Driven Development (TDD)](https://www.freecodecamp.org/news/test-driven-development-what-it-is-and-what-it-is-not-41fa6bca02a2/)
- [Testing Best Practices](https://testautomationuniversity.com/)
- [The Testing Pyramid](https://martinfowler.com/articles/practical-test-pyramid.html)

### PIF Compiler Specific
- [tests/README.md](../tests/README.md) - Test suite documentation
- [tests/RUN_TESTS.md](../tests/RUN_TESTS.md) - Quick start guide
- [REFACTORING.md](../REFACTORING.md) - Code organization changes

---

## Summary

**Testing transforms your development workflow:**

| Without Tests | With Tests |
|---------------|------------|
| Manual verification | Automated checks |
| Slow feedback | Instant feedback |
| Fear of breaking things | Confidence to refactor |
| Undocumented behavior | Tests as documentation |
| Debug for hours | Pinpoint issues immediately |

**Start small:**
1. Write tests for one service (✅ COSING done!)
2. Add tests for new features
3. Fix bugs with tests first
4. Gradually increase coverage

**The investment pays off:**
- Fewer bugs in production
- Faster development (less debugging)
- Better code design
- Easier collaboration
- Peace of mind 😌

---

*Last updated: 2025-01-04*