Phase 6: Triple Inspection (Optional)

Status: OPTIONAL (critically recommended) Agile: Quality Gates / DoD Roles: Team + LLM Human: 40% LLM: 60%

In brief: 3 automated LLM inspections (Fagan/Tests/Security) detect latent defects before production. Philosophy “invest 4-6h avoids 40-80h refactoring + incidents”. Fagan resurrection: 30h (1976) → 40min (2026). OPTIONAL but ROI 10-100x on critical systems.

Why This Phase Is Critical (If Applied)

The problem without Phase 6: Code enters production with invisible latent defects. Technical debt accumulates silently over 6-12 months. Weak tests (95% coverage but empty assertions) create false sense of security. Vulnerabilities detected in production = catastrophe (30-100x cost vs development). Unexpected major refactoring + expensive incidents.

The solution provided: 3 specialized LLM inspections detect at-risk patterns BEFORE merge. Fagan reveals architectural debt, Tests distinguishes code coverage vs semantic coverage, Security detects 6 OWASP attack vectors (injection, authentication, data, infrastructure, logic, monitoring). This is preventive quality assurance that transforms unpredictable future costs into controlled investment.

LLM advantages over human inspection:

Exhaustiveness without fatigue: The LLM verifies 100% of cases with identical rigor. Humans fatigue, skim, forget items
No cognitive bias: The LLM evaluates objectively. Humans may have confirmation bias, hesitate to criticize seniors
Perfect memory of standards: The LLM applies 100% of rules, even obscure ones. Humans tend to forget critical obscure rules
Expertise scalability: LLMs can inspect 10 components in parallel. Humans cannot, alone
Continuous improvement: New rule added → all future inspections automatic. Training the team can take days

The Resurrection of Fagan

Historical context: Michael Fagan (IBM, 1976) develops a revolutionary code inspection methodology.

graph TD
    F76[Fagan 1976] --> RES[Proven Results]
    RES --> D1[60-90% defects detected]
    RES --> D2[ROI 10-100x]
    RES --> D3[Exceptional quality]

    F76 --> PROB[But Impractical]
    PROB --> C1[30h per component]
    PROB --> C2[2-4h meeting]
    PROB --> C3[2h preparation × 6-8 people]
    PROB --> ABD[Abandoned 1990s]

    ABD --> LLM[LLM 2024 Change Everything]
    LLM --> NEW[10min generation + 30min validation]
    NEW --> RES2[Same Results]
    NEW --> PRAT[Finally Practical!]

    style F76 fill:#fbbf24
    style ABD fill:#ef4444
    style LLM fill:#10b981
    style PRAT fill:#10b981

The best practices from 1970 were CORRECT - just impossible to apply without AI.

When to Do or Skip Phase 6

Do Phase 6 if:

Code is critical (finance, healthcare, infrastructure)
Long lifespan (>2 years evolution)
High cost of production bugs (reputation, money)
Strict regulatory compliance
Large number of affected users

Possibly skip if:

Prototypes, POC, throwaway code
Simple components, low criticality
Highly senior team, already excellent standards
Time constraints JUSTIFIED

Decision rule:

Probability of production bug × Cost of incident > Cost of Phase 6
→ DO Phase 6

Example:
10% probability × $50,000 incident = $5,000 expected value
vs $67 Phase 6
→ 75x ROI, do Phase 6

Process

Inputs:

Code in REFACTOR state (Phase 5)
Complete test suite with results
Requirements documents (architecture, tactical plan)
Quality standards and benchmarks

The 3 Specialized Inspections

1. Fagan Inspection - Code & Maintainability

Objective: Detect future technical debt and maintainability risk patterns

LLM 70%, Human 30%

LLM generates inspection report
Human reviews report
Team prioritizes corrections
Retained corrections are applied or added to backlog

5 Dimensions evaluated (/20 each, target ≥80/100):

Simplicity: No over-engineering, complexity <10, appropriate abstractions
Business Logic: Domain names, clear rules, contextual error handling
Robustness: Edge cases, input validation, graceful recovery
Maintainability: Ease of evolution, minimal coupling, documentation
Performance: Scalability, appropriate optimizations, no bottlenecks

Output: Detailed report containing strengths, weaknesses, and items marked CRITICAL or IMPROVEMENT.

2. Tests Inspection - Test Suite Quality

Objective: Distinguish “code coverage” vs “semantic coverage”

LLM 70%, Human 30%

LLM analyzes test quality
Human reviews
Team strengthens weak tests
Re-validation

Detects:

Empty assertions (assert result is not None vs actual values)
Fragile tests (coupled to implementation)
Missing critical edge cases
False sense of security (95% coverage but weak tests)

Output: Strengthened tests, confidence in test quality

3. Security Inspection - Vulnerabilities

Objective: Detect 6 attack vectors before production (OWASP)

LLM 80%, Human 20%

LLM conducts multi-vector audit
Human reviews vulnerabilities
Team prioritizes corrections
Retained corrections are applied or added to backlog
Re-validation following corrections

6 OWASP Attack Vectors:

Injection: SQL, commands, XSS, LDAP
Authentication: Sessions, tokens, MFA, passwords
Sensitive Data: Encryption, logs, exposure
Infrastructure: HTTPS, CORS, security headers
Business Logic: Authorization, validation, transactions
Monitoring: Logs, alerts, audit trail

Output: CRITICAL vulnerabilities corrected, IMPROVEMENT items documented

Integrated Workflow

1. LLM executes 3 inspections in parallel
   ├─ Fagan
   ├─ Tests
   └─ Security

2. Team reviews 3 reports
   ├─ Prioritizes findings (CRITICAL vs IMPROVEMENT)
   └─ Decides action plan

3. Corrections applied
   ├─ CRITICAL: Mandatory before merge
   ├─ IMPROVEMENT: Backlog or accept
   └─ Tests still pass

4. Senior validates
   └─ Approves final production quality

Definition of Done

This phase is considered complete when:

All 3 inspections executed (Fagan, Tests, Security)
Fagan score ≥ 80/100 (or justified acceptance < 80)
All CRITICAL items corrected
Tests strengthened if weaknesses detected
CRITICAL security vulnerabilities eliminated
IMPROVEMENT items documented (backlog or accept)
Senior approves final inspection report

Going Further

See detailed inspection reports + complete prompts

Inspection 1: Fagan - Complete Example

Code Inspected: confidence_calculator (Post-REFACTOR)

# [Code REFACTOR Phase 5 - 180 lines with extracted functions]
# See Phase 5 for complete code

Fagan Inspection Prompt

Perform a systematic Fagan inspection of this code across 5 perspectives.

CODE TO INSPECT:
[paste complete code in REFACTOR state]

SPECIFICATIONS:
[paste tactical spec Phase 2]

EVALUATE EACH DIMENSION ON 20 POINTS:

1. SIMPLICITY (/20)
   Questions:
   - Code clear and straightforward? No over-engineering?
   - Abstractions appropriate to the problem?
   - Cyclomatic complexity acceptable (<10/function)?
   - Logic easy to follow without debugger?

   For each question:
   - YES: +5 points
   - PARTIALLY: +2-3 points
   - NO: 0 points

   Identify ALL overly complex patterns.

2. BUSINESS LOGIC (/20)
   Questions:
   - Variable/function names reflect business domain?
   - Business rules clear and documented?
   - Contextual error handling (explicit messages)?
   - Code readable by Product Owner?

   Identify ALL obscure logic.

3. ROBUSTNESS (/20)
   Questions:
   - All edge cases handled (null, empty, limits)?
   - Exhaustive input validation?
   - Graceful error recovery (no crashes)?
   - Defined behavior for ALL scenarios?

   Identify ALL missing edge cases.

4. MAINTAINABILITY (/20)
   Questions:
   - Code easy to modify without breaking?
   - Minimal coupling between components?
   - Documentation helps new developers?
   - Tests enable safe refactoring?

   Identify ALL maintainability-at-risk patterns.

5. PERFORMANCE (/20)
   Questions:
   - Performant algorithms (no avoidable O(n²))?
   - Scalability validated (large volumes)?
   - No obvious bottlenecks (I/O, nested loops)?
   - Optimizations appropriate (not premature)?

   Identify ALL potential bottlenecks.

REPORT FORMAT:

## Global Score: XX/100

### 1. Simplicity: XX/20
**Strengths**:
- [List]

**Weaknesses**:
- [List with precise code line]

**CRITICAL Items**:
- [Complexity >10, severe over-engineering]

**IMPROVEMENT Items**:
- [Possible improvements non-blocking]

[Repeat for 5 dimensions]

## Decision Summary

**CRITICAL Items (block merge)**:
1. [Item with impact justification]

**IMPROVEMENT Items (backlog)**:
2. [Item nice-to-have]

**Recommendation**: APPROVE / FIX CRITICAL / REDO

Generated Report

## Global Fagan Score: 88/100

### 1. Simplicity: 18/20

**Strengths**:
- Short functions (10-30 lines each)
- Single Responsibility applied (4 focused functions)
- Average cyclomatic complexity 3 (excellent, <10)
- Direct logic, no excessive abstractions

**Weaknesses**:
- Input validation scattered (_validate_inputs + checks in calculate_confidence)
  Lines 145-155: Duplicate validation logic

**CRITICAL Items**: None

**IMPROVEMENT Items**:
- Consolidate validation into _validate_inputs only (eliminate duplication)

---

### 2. Business Logic: 20/20

**Strengths**:
- Named constants (MIN_SAMPLE_FOR_STATISTICS = 3)
- Descriptive function names (_calculate_sample_coverage_penalty)
- Docstrings explain business formulas
- Contextual error messages ("top_k_similar must be > 0")

**Weaknesses**: None

**CRITICAL Items**: None
**IMPROVEMENT Items**: None

---

### 3. Robustness: 16/20

**Strengths**:
- Input validation (total_similarity ≤ 0, n_contributors < 0)
- ValueError exception if top_k ≤ 0
- Zero contributor case handled (return 0.0)
- Warning logging for invalid inputs

**Weaknesses**:
- No handling of weighted_presence > total_similarity (mathematically invalid)
  Line 133: Should validate weighted_presence ≤ total_similarity
- No handling of extreme values (overflow if weighted_presence = 1e308)

**CRITICAL Items**:
- Add validation weighted_presence ≤ total_similarity (mathematical consistency)

**IMPROVEMENT Items**:
- Add sanity check for overflow (extreme values)

---

### 4. Maintainability: 18/20

**Strengths**:
- Extracted functions testable in isolation
- Zero coupling (no external dependencies)
- Exhaustive documentation (formulas, examples, notes)
- Tunable constants centralized

**Weaknesses**:
- Statistical penalty formula (0.5 + n/6.0) undocumented origin
  Line 234: Where does formula come from? Missing scientific reference

**CRITICAL Items**: None

**IMPROVEMENT Items**:
- Add scientific reference for penalty formula (decision traceability)

---

### 5. Performance: 16/20

**Strengths**:
- Constant O(1) complexity
- No I/O, no side effects (pure function)
- Performance test <1ms passes
- No unnecessary memory allocations

**Weaknesses**:
- Debug logging can impact production performance if enabled
  Lines 159-163: logger.debug() with string formatting
  → Use lazy evaluation: logger.debug("...", value) not f-strings

**CRITICAL Items**: None

**IMPROVEMENT Items**:
- Lazy logging evaluation (marginal performance but best practice)

---

## Decision Summary

**CRITICAL Items (block merge until corrected)**:
1. **Missing validation weighted_presence ≤ total_similarity** (Line 133)
   - Impact: Allows mathematically invalid states (confidence >1.0 possible)
   - Effort: 5 min (add if weighted_presence > total_similarity: return 0.0)
   - Priority: HIGH

**IMPROVEMENT Items (backlog, non-blocking)**:
2. Consolidate input validation (eliminate duplication)
3. Sanity check for overflow extreme values
4. Scientific reference for penalty formula
5. Lazy logging evaluation

**Final Recommendation**: **FIX CRITICAL #1 then APPROVE**

Score 88/100 excellent. Single blocking item (mathematical validation).
After correction: production quality validated.

Team Actions

1. Dev applies CRITICAL fix #1
2. Tests executed → Always 100% pass
3. Senior validates correction
4. IMPROVEMENT items added to backlog
5. Merge approved ✓

Inspection 2: Tests - Example

Test Suite Analyzed

# Suite 19 tests Phase 3 (see Phase 3 for complete code)

Tests Inspection Prompt

Analyze QUALITY of this test suite. Distinguish "code coverage" vs "semantic coverage".

TEST SUITE:
[paste 19 complete tests]

CODE TESTED:
[paste implementation]

EVALUATE QUALITY ON 5 CRITERIA:

1. MEANINGFUL ASSERTIONS
   - Assertions verify PRECISE VALUES (not just `is not None`)?
   - Each assertion has EXPLICIT MESSAGE and context?
   - Assertions cover business behavior (not just technical)?

   Identify ALL tests with weak assertions.

2. TEST ROBUSTNESS
   - Tests not fragile (coupled to implementation details)?
   - Tests survive refactoring without modifications?
   - Tests isolated (no order-of-execution dependencies)?

   Identify ALL fragile tests.

3. CRITICAL EDGE CASES
   - All business edge cases tested?
   - Error scenarios covered?
   - Boundary values (min/max/zero) tested?

   Identify ALL missing edge cases.

4. SEMANTIC COVERAGE
   - Tests validate BUSINESS BEHAVIORS (not just executed code)?
   - Each business rule has dedicated test?
   - Tests are executable documentation of business logic?

   Identify semantic coverage gaps.

5. TEST MAINTAINABILITY
   - Descriptive test names (behavior + expected result)?
   - Fixtures eliminate duplication?
   - Tests readable without debugger?

   Identify maintainability problems.

REPORT FORMAT:

## Test Suite Quality: SCORE/5

### Analysis by Criterion

[5 sections with Strengths / Weaknesses / CRITICAL / IMPROVEMENT]

### Weak Tests Identified

**Test #X: [name]**
- Problem: [empty assertion, fragile, etc.]
- Impact: [false positive, maintenance, etc.]
- Suggested correction: [improved code]

### Recommendation

APPROVE / STRENGTHEN TESTS

Generated Report (Key Findings)

## Test Suite Quality: 4.5/5 (Excellent)

### 1. Meaningful Assertions: 5/5
All assertions verify precise values
Explicit context messages
No vague assertions (`is not None` alone)

### 2. Test Robustness: 5/5
Tests not coupled to implementation
Test behavior, not internal structure
Isolated, order-independent

### 3. Critical Edge Cases: 4/5
Zero contributors: ✓
n < 3 statistics penalty: ✓
top_k = 0 exception: ✓
**MISSING**: weighted_presence > total_similarity (mathematically invalid)

### 4. Semantic Coverage: 5/5
Each business rule tested
Tests = behavior documentation
Concrete examples in docstrings

### 5. Test Maintainability: 4/5
Descriptive naming
Reusable fixtures
Some tests > 20 lines (extract helpers)

---

## Weak Tests Identified

**No "weak" tests detected**

But 1 missing edge case:

### Missing Edge Case: Weighted > Total (Invalid)

**Test to add**:
```python
def test_calculate_confidence_weighted_exceeds_total_returns_zero():
    """
    Invalid edge case: weighted_presence > total_similarity.
    Mathematically inconsistent, should return 0.0 defensively.
    """
    result = calculate_confidence(
        weighted_presence=2.0,  # > total!
        total_similarity=1.0,
        n_contributors=5,
        top_k_similar=5
    )
    assert result == 0.0, \
        "weighted > total mathematically invalid, return 0"

Justification:

Mathematical consistency: weighted cannot exceed total
Defense: Protects against corrupted data
Aligns with Fagan Inspection finding #1

Recommendation

STRENGTHEN: Add test for weighted > total edge case, then APPROVE.

Test suite already excellent (4.5/5). Single missing edge case identified (consistent with Fagan finding). After adding test: quality 5/5.

Inspection 3: Security - Example

Security Inspection Prompt

Perform exhaustive OWASP security audit across 6 attack vectors.

CODE TO AUDIT:
[paste complete code]

APPLICATION CONTEXT:
[description: web API, nutritional data, authenticated users]

AUDIT 6 VECTORS:

1. INJECTION (SQL, Commands, XSS, LDAP)
   - External inputs sanitized?
   - Parameterized queries (not string concat)?
   - Strict type/format validation?

2. AUTHENTICATION & SESSIONS
   - Secure tokens (JWT, OAuth)?
   - MFA supported?
   - Passwords hashed (bcrypt, argon2)?
   - Session timeout?

3. SENSITIVE DATA
   - Encryption at rest (DB)?
   - Encryption in transit (HTTPS)?
   - No logs/debug of sensitive data?
   - PII anonymized?

4. INFRASTRUCTURE
   - HTTPS mandatory?
   - Security headers (CSP, HSTS, X-Frame)?
   - CORS strictly configured?
   - Dependencies current (no CVEs)?

5. BUSINESS LOGIC
   - Authorization verified (not just auth)?
   - Business rules validated server-side?
   - Atomic transactions (no race conditions)?
   - API rate limiting?

6. MONITORING & LOGGING
   - Security events logged?
   - Intrusion attempt alerts?
   - Audit trail of sensitive activities?
   - No logs of secrets/tokens?

For EACH vector, identify:
- CRITICAL vulnerabilities (easily exploitable)
- IMPROVEMENT vulnerabilities (defense in depth)

REPORT FORMAT:

## Security Audit: CRITICAL / IMPROVEMENT / PASS

[6 sections by vector]

### CRITICAL Vulnerabilities (block merge)
[List with CVSS score, exploit scenario, fix]

### IMPROVEMENT Vulnerabilities (backlog)
[List to improve defense in depth]

Recommendation: BLOCK / FIX / APPROVE

Generated Report (Critical System Example)

## Security Audit: PASS (with 2 IMPROVEMENT)

### 1. Injection: PASS
**Analysis**:
- confidence_calculator module = pure mathematical function
- No DB access, no external requests
- Inputs validated (types, limits)
- No injection vector applicable

**Vulnerabilities**: None

---

### 2. Authentication & Sessions: N/A
**Analysis**:
- Low-level module (calculation), no auth handling
- Auth managed at API layer (out of scope for this module)

**Vulnerabilities**: N/A for this module

---

### 3. Sensitive Data: IMPROVEMENT
**Analysis**:
- Function logs intermediate values (lines 159-163)
- If nutritional data = PII in some jurisdictions

**IMPROVEMENT Vulnerabilities**:
1. **Logging Potentially Sensitive Data**
   - Line 159: `logger.debug(f"... n_contributors={n}")`
   - Risk: Logs may contain user data
   - CVSS: 3.1 (Low) - Information Disclosure
   - Fix: Add LOG_SENSITIVE_DATA flag (default False)
   - Effort: 15 min

---

### 4. Infrastructure: IMPROVEMENT
**Analysis**:
- Standard Python module, no external dependencies
- No known CVEs

**IMPROVEMENT Vulnerabilities**:
2. **Input Bounds Validation (DoS prevention)**
   - No upper limit on weighted_presence/total_similarity
   - Risk: Extreme values (1e308) → float overflow
   - CVSS: 2.5 (Low) - Potential DoS
   - Fix: Add MAX_VALID_VALUE = 1e6, reject if exceeded
   - Effort: 10 min

---

### 5. Business Logic: PASS
**Analysis**:
- Business rule validation (n >= 0, top_k > 0)
- No race conditions (pure function)
- Deterministic logic

**Vulnerabilities**: None

---

### 6. Monitoring & Logging: PASS
**Analysis**:
- Appropriate logging (warnings for invalid inputs)
- No logs of secrets/tokens (not applicable here)
- Debug logs disableable in production

**Vulnerabilities**: None

---

## Security Audit Summary

### CRITICAL Vulnerabilities: None

Pure mathematical module, minimal attack surface.

### IMPROVEMENT Vulnerabilities (Defense in Depth):
1. LOG_SENSITIVE_DATA flag to control PII logging
2. MAX_VALID_VALUE validation to prevent DoS overflow

**Total Correction Effort**: 25 minutes

**Recommendation**: **APPROVE**

Module secure for production deployment. The 2 suggested improvements
are defense in depth (nice-to-have), not security blockers.

If system VERY critical (healthcare, finance) → Apply 2 improvements.
Otherwise → Backlog acceptable.

Phase 6 ROI - Calculation Examples

Scenario 1: Standard Module

Phase 6 Investment: 5h × $100/h = $500

Expected Benefits:
- Probability of production bug: 15%
- Average bug cost: $5,000 (debugging, patch, tests)
- Expected value: 0.15 × $5,000 = $750

ROI = ($750 - $500) / $500 = 50%

Conclusion: Positive but marginal ROI. Phase 6 optional.

Scenario 2: Critical Finance Module

Phase 6 Investment: 5h × $100/h = $500

Expected Benefits:
- Probability of vulnerability: 10%
- Cost of security incident:
  - Incident response: $20,000
  - Full audit: $30,000
  - Customer notification: $10,000
  - Reputation: $50,000
  - Total: $110,000
- Expected value: 0.10 × $110,000 = $11,000

ROI = ($11,000 - $500) / $500 = 2,100%

Conclusion: 21x ROI. Phase 6 MANDATORY.

Scenario 3: Infrastructure Module (1M users)

Phase 6 Investment: 6h × $100/h = $600

Expected Benefits:
- Probability of major defect: 20%
- Cost of production defect:
  - 2h downtime: $50,000
  - Incident response: $10,000
  - Emergency refactoring: $40,000
  - Total: $100,000
- Expected value: 0.20 × $100,000 = $20,000

ROI = ($20,000 - $600) / $600 = 3,233%

Conclusion: 32x ROI. Phase 6 CRITICAL.

Phase 6 Decision Checklist

Do Phase 6 IF at least 2 criteria met:

Code critical (finance, healthcare, infrastructure)
>10,000 affected users
Lifespan >2 years
Production bug cost >$10,000
Regulatory compliance (HIPAA, PCI-DSS, SOC2)
Junior/intermediate team (not all seniors)

Skip Phase 6 IF all criteria met:

If gray area: Do Phase 6 at least once to learn. Decide later based on observed ROI.

Final phase completed!

Congratulations: You now have the complete DC² methodology in 6 phases for production-quality software development with AI!