Phase 5: TDD REFACTOR - Collaborative Improvement
In brief: Team transforms GREEN code (functional) into production code (elegant) under senior dev guidance. Team EXECUTES refactorings, senior GUIDES. Learning investment transforms team into force multiplier, not bottleneck.
Why This Phase Is Critical
The problem without structured Phase 5: GREEN code remains functional but basic (duplication, O(n²) algorithms, magic numbers). Technical debt accumulates silently. Discovery 6-12 months later → major rewrite. OR senior does all refactoring solo → bottleneck, team stagnates, burnout.
The solution provided: Systematic refactoring BEFORE merge eliminates nearly all technical debt. Production code = maintainable, performant, documented. CRITICAL: Team executes under senior guidance (not senior solo). Senior becomes multiplier: guides 3-4 devs simultaneously who grow through practice.
LLM limitations addressed:
- No judgment on architectural elegance: Senior identifies refactoring opportunities (code smells, applicable patterns), LLM assists mechanical transformations
- No deep performance optimization: Senior profiles code, identifies bottlenecks, selects algorithms. LLM generates optimized code under direction
Team That Learns vs Senior That Burns Out
Anti-Pattern (Senior Solo):
graph TD
GREEN[GREEN Code] --> SEN[Senior does all refactoring solo]
SEN --> R1[Refactor component 1]
R1 --> R2[Refactor component 2]
R2 --> R3[Refactor component 3]
R3 --> R4[Refactor component 4]
EQ[Team waits] -.->|Blocked| SEN
EQ -.->|No learning| STAG[Skills stagnate]
SEN -.->|Overload| BURN[Senior burnout]
style SEN fill:#ef4444
style BURN fill:#ef4444
style STAG fill:#fbbf24
Problems:
- Senior = bottleneck (1 component at a time)
- Team passive (waits, doesn’t learn)
- Not scalable (senior exhausted, team dependent)
- Velocity plateaus (senior capacity limited)
DC² Pattern (Guided Team):
graph TD
GREEN[GREEN Code] --> SEN[Senior identifies opportunities]
SEN --> REV[Team review: senior explains]
REV --> DEV1[Dev 1: Refactor component 1]
REV --> DEV2[Dev 2: Refactor component 2]
REV --> DEV3[Dev 3: Refactor component 3]
DEV1 --> VAL1[Senior reviews + feedback]
DEV2 --> VAL2[Senior reviews + feedback]
DEV3 --> VAL3[Senior reviews + feedback]
VAL1 --> LEARN[Team learns through practice]
VAL2 --> LEARN
VAL3 --> LEARN
LEARN --> AUTO[Team more autonomous next sprint]
style SEN fill:#10b981
style LEARN fill:#3b82f6
style AUTO fill:#10b981
Advantages:
- Refactorings PARALLEL (3-4 simultaneous)
- Team ACTIVE (executes, learns)
- Scalable (senior guides, not executes)
- Growing velocity (team autonomous)
Workflow
Inputs:
- Implementation in GREEN state (Phase 4)
- Passing test suite (100%)
- Quality standards (complexity < 10, coupling metrics)
- Performance requirements (latency, throughput, memory)
1. Identify Refactoring Opportunities ⏱️⏱️
Senior Dev 90%, LLM 10%
Senior reviews GREEN code and identifies:
- Code smells (duplication, long functions, deep nesting)
- Potential performance bottlenecks (inefficient algorithms)
- Architectural improvements (applicable patterns)
- Maintainability issues (magic numbers, vague names)
LLM assists:
- Suggests additional opportunities via static analysis
- Generates quality metrics report (cyclomatic complexity)
Output: Prioritized refactoring list (critical → nice-to-have)
2. Team Review Session ⏱️
Senior Dev 80%, Team 20%
- Senior presents identified opportunities
- Explains WHY refactoring necessary (not just WHAT)
- Demonstrates 1-2 refactorings live (pedagogy)
- Team asks questions, clarifies understanding
- Work assignment: Each dev takes 1-2 refactorings
Objective: Team understands quality vision before executing
3. Execute Parallel Refactorings ⏱️⏱️⏱️
Team 60%, Senior Dev 30%, LLM 10%
Team executes assigned refactorings:
- Extract functions (Single Responsibility)
- Apply design patterns (Strategy, Factory if relevant)
- Optimize algorithms (O(n²) → O(n log n))
- Improve variable/function names
- Add logging, robust error handling
LLM assists:
- Generates refactored code under dev direction
- Mechanical transformations (renaming, extractions)
Senior available:
- Team questions (clarifications, decisions)
- Intermediate review (avoid false path)
CRITICAL: Refactorings in PARALLEL (3-4 devs simultaneous)
4. Documentation Improvement ⏱️⏱️
Team 40%, LLM 60%
- Senior defines what needs documentation
- LLM generates detailed docstrings
- Team adds inline comments for complex logic
- LLM updates README, architecture docs
Focus: Why code works this way (not just what)
5. Senior Review + Validation ⏱️⏱️
Senior Dev 70%, Team 30%
For each refactoring:
- Senior reviews refactored code
- Constructive feedback (not “redo”, but “improve X because Y”)
- Team adjusts per feedback
- Tests executed (must ALWAYS pass 100%)
Quality metrics validated:
- Cyclomatic complexity < 10
- Code duplication eliminated
- Performance benchmarks met
Output: Production code approved by senior
Definition of Done
This phase is considered complete when:
- All tests still pass (100% - no regressions)
- Cyclomatic complexity reduced (< 10 per function)
- Code duplication eliminated (DRY principle applied)
- Performance benchmarks met or exceeded
- SOLID principles applied where appropriate
- Complete and accurate documentation (docstrings + comments)
- Senior dev approves production quality
- Team has learned: Can explain refactorings performed
Go Further
See complete GREEN → REFACTOR transformation + patterns
Complete Example: confidence_calculator Transformation
GREEN Code (Phase 4) - Functional But Basic
def calculate_confidence(
weighted_presence: float,
total_similarity: float,
n_contributors: int,
top_k_similar: int
) -> float:
"""
Calculate prediction confidence score with sample size penalties.
"""
# Validate top_k (avoid division by zero)
if top_k_similar <= 0:
raise ValueError("top_k_similar must be > 0")
# Degenerate cases: return 0.0 immediately
if total_similarity <= 0 or n_contributors < 0 or weighted_presence < 0:
return 0.0
if n_contributors == 0:
return 0.0
# Calculate raw confidence
confidence_raw = weighted_presence / total_similarity
# Sample size penalty
sample_size_penalty = min(n_contributors / top_k_similar, 1.0)
# Statistical penalty for very small samples
if n_contributors < 3:
statistical_penalty = 0.5 + (n_contributors / 6.0)
else:
statistical_penalty = 1.0
# Final result
return confidence_raw * sample_size_penalty * statistical_penalty
Problems identified by Senior:
- Monolithic: Everything in one function (60 lines)
- Scattered validations: 4 separate if statements
- No logging: Difficult to debug if 0.0 returned
- Magic numbers: 3, 6.0, 0.5 unexplained
- Minimal documentation: Formulas not explained
- Improvable naming:
confidence_raw→base_confidence
Team Review Session (Senior Explains)
Senior: “We’ll refactor this module together. Here’s what we’ll do and WHY:”
Refactoring 1: Extract Penalty Calculation Functions
WHY? Single Responsibility Principle (SOLID).
Each function = one clear responsibility.
calculate_confidence() becomes coordinator
_calculate_sample_penalty() = sample size responsibility
_calculate_statistical_penalty() = statistical reliability responsibility
BENEFIT:
- Testable independently (unit tests per function)
- Reusable (penalties usable elsewhere)
- Understandable (function name = documentation)
Refactoring 2: Add Debug Logging
WHY? Support production debugging.
If confidence = 0.0, we want to know WHY:
- total_similarity = 0?
- n_contributors = 0?
- Intermediate calculation values?
BENEFIT:
- Faster debugging (logs show what's happening)
- Production monitoring (alerts if anomalies)
Refactoring 3: Named Constants
WHY? Magic numbers = obscure code.
Reader doesn't understand where 0.5, 6.0, 3 come from.
MIN_SAMPLE_FOR_STATISTICS = 3 # Statistical reliability threshold
STAT_PENALTY_BASE = 0.5 # Minimum penalty n=0
STAT_PENALTY_SCALE = 6.0 # Linear scale
BENEFIT:
- Inline documentation (name = explanation)
- Easily adjustable (change constant, not hunt through code)
Assignment:
- Dev 1: Extract penalty functions
- Dev 2: Add logging + constants
- Dev 3: Improve docstrings + formula comments
REFACTOR Code (Production) - Elegant and Maintainable
"""
Module confidence_calculator - Calculate nutrition prediction confidence scores.
Implements multi-penalty algorithm to avoid overconfidence
when data is limited or sample is statistically weak.
"""
import logging
from typing import Final
logger = logging.getLogger(__name__)
# Algorithm configuration constants
MIN_SAMPLE_FOR_STATISTICS: Final[int] = 3
"""Minimum contributors threshold for acceptable statistical reliability."""
STAT_PENALTY_BASE: Final[float] = 0.5
"""Base penalty for zero sample (n=0)."""
STAT_PENALTY_SCALE: Final[float] = 6.0
"""Linear scale for statistical penalty [0.5, 0.833] for n ∈ [0, 2]."""
def calculate_confidence(
weighted_presence: float,
total_similarity: float,
n_contributors: int,
top_k_similar: int
) -> float:
"""
Calculate prediction confidence score with sample size penalties.
Applies two multiplicative penalties to reduce overconfidence:
1. **Sample coverage penalty**: Reduces confidence proportionally
to ratio of effective contributors / target contributors.
- If n_contributors >= top_k: No penalty (factor 1.0)
- If n_contributors < top_k: Linear penalty (n/top_k)
2. **Statistical reliability penalty**: Reduces confidence additionally
for very small samples (n < 3) judged statistically unstable.
- If n >= 3: No penalty (factor 1.0)
- If n < 3: Penalty 0.5 + (n / 6.0) ∈ [0.5, 0.833]
Final formula:
confidence = (weighted_presence / total_similarity)
× min(n/top_k, 1.0)
× [1.0 if n>=3 else 0.5 + n/6.0]
Args:
weighted_presence: Sum of (similarity × ingredient_presence) for all contributors.
Expected value ∈ [0, total_similarity].
total_similarity: Sum of similarity scores for all contributors.
Must be > 0 (otherwise returns 0.0).
n_contributors: Number of effective food item contributors to prediction.
Expected value ≥ 0.
top_k_similar: Target number of similar foods (typically 5).
Must be > 0 (otherwise ValueError).
Returns:
Final confidence score ∈ [0.0, 1.0], penalized by sample size/quality.
Raises:
ValueError: If top_k_similar <= 0 (division by zero impossible).
Examples:
>>> # Complete sample (5/5), good confidence
>>> calculate_confidence(0.8, 1.0, 5, 5)
0.8 # No penalty
>>> # Small sample (2/5), double penalty
>>> calculate_confidence(1.0, 1.0, 2, 5)
0.33 # Coverage penalty (2/5) + stats penalty (0.833)
>>> # Zero sample
>>> calculate_confidence(1.0, 1.0, 0, 5)
0.0 # Degenerate case
Notes:
- Pure function (no side effects)
- O(1) constant performance
- Thread-safe
"""
# Validate critical parameter (avoid division by zero)
if top_k_similar <= 0:
raise ValueError(
f"top_k_similar must be > 0 (received: {top_k_similar})"
)
# Validate inputs and handle degenerate cases
if not _validate_inputs(weighted_presence, total_similarity, n_contributors):
logger.warning(
"Invalid inputs detected: weighted_presence=%.3f, "
"total_similarity=%.3f, n_contributors=%d. Returning confidence 0.0",
weighted_presence, total_similarity, n_contributors
)
return 0.0
# Calculate base confidence (before penalties)
base_confidence = weighted_presence / total_similarity
# Apply penalties
coverage_factor = _calculate_sample_coverage_penalty(n_contributors, top_k_similar)
reliability_factor = _calculate_statistical_penalty(n_contributors)
# Final confidence
final_confidence = base_confidence * coverage_factor * reliability_factor
# Debug logging for traceability
logger.debug(
"Confidence calculated: base=%.3f, coverage_factor=%.3f, "
"reliability_factor=%.3f → final=%.3f (n=%d/%d)",
base_confidence, coverage_factor, reliability_factor,
final_confidence, n_contributors, top_k_similar
)
return final_confidence
def _validate_inputs(
weighted_presence: float,
total_similarity: float,
n_contributors: int
) -> bool:
"""
Validate confidence calculation inputs.
Returns:
True if inputs valid, False if degenerate case (return 0.0).
"""
# Total similarity must be positive (avoid division zero)
if total_similarity <= 0:
return False
# Number of contributors must be non-negative
if n_contributors < 0:
return False
# Weighted presence cannot be negative
if weighted_presence < 0:
return False
# Zero contributors = no prediction
if n_contributors == 0:
return False
return True
def _calculate_sample_coverage_penalty(
n_contributors: int,
top_k_similar: int
) -> float:
"""
Calculate penalty based on sample coverage ratio.
Penalizes confidence proportionally to contributor deficit
relative to optimal target (top_k_similar).
Args:
n_contributors: Number of effective contributors.
top_k_similar: Target number of contributors.
Returns:
Penalty factor ∈ [0.0, 1.0]:
- 1.0 if n_contributors >= top_k_similar (complete sample)
- n_contributors / top_k_similar otherwise (linear penalty)
Examples:
>>> _calculate_sample_coverage_penalty(5, 5)
1.0 # No penalty
>>> _calculate_sample_coverage_penalty(3, 5)
0.6 # 40% penalty
>>> _calculate_sample_coverage_penalty(10, 5)
1.0 # Capped at 1.0 (no bonus)
"""
return min(n_contributors / top_k_similar, 1.0)
def _calculate_statistical_penalty(n_contributors: int) -> float:
"""
Calculate additional penalty for statistically weak samples.
Applies extra penalty when number of contributors < statistical
reliability threshold (MIN_SAMPLE_FOR_STATISTICS = 3).
Justification: With < 3 data points, variance estimation is high,
prediction is statistically unstable.
Args:
n_contributors: Number of effective contributors.
Returns:
Penalty factor ∈ [0.5, 1.0]:
- 1.0 if n_contributors >= 3 (acceptable reliability)
- 0.5 + (n_contributors / 6.0) otherwise (progressive penalty)
- n=0 → 0.5 (maximum penalty)
- n=1 → 0.67
- n=2 → 0.83
Examples:
>>> _calculate_statistical_penalty(5)
1.0 # No penalty
>>> _calculate_statistical_penalty(2)
0.83 # Moderate penalty
>>> _calculate_statistical_penalty(0)
0.5 # Maximum penalty
"""
if n_contributors < MIN_SAMPLE_FOR_STATISTICS:
return STAT_PENALTY_BASE + (n_contributors / STAT_PENALTY_SCALE)
return 1.0
REFACTOR Improvements Analysis
Transformations Applied:
-
Function Extraction (SRP - Single Responsibility Principle):
_validate_inputs(): Validation responsibility_calculate_sample_coverage_penalty(): Coverage responsibility_calculate_statistical_penalty(): Reliability responsibility- Impact: Independently testable, reusable, understandable
-
Named Constants:
MIN_SAMPLE_FOR_STATISTICS = 3STAT_PENALTY_BASE = 0.5STAT_PENALTY_SCALE = 6.0- Impact: Inline documentation, centrally adjustable
-
Debug Logging:
- Warning if invalid inputs (with values)
- Debug intermediate calculations (traceability)
- Impact: Production debugging easier, monitoring enabled
-
Comprehensive Documentation:
- Main docstring: 50 lines (formulas, examples, notes)
- Private function docstrings: Mathematical justifications
- Inline comments: Complex logic explained
- Impact: Rapid onboarding for new devs
-
Robust Error Handling:
- Centralized
_validate_inputs()function - Explicit error messages (received values displayed)
- Impact: Understandable errors, rapid debugging
- Centralized
-
Improved Naming:
confidence_raw→base_confidence(clearer)sample_size_penalty→coverage_factor(semantic)statistical_penalty→reliability_factor(intent)- Impact: Self-documenting code
Quality Metrics:
| Metric | GREEN | REFACTOR | Improvement |
|---|---|---|---|
| Lines of code | 35 | 180 | +414% |
| Functions | 1 | 4 | +300% |
| Cyclomatic complexity | 8 | 3 (avg) | -63% |
| Comments | 10 lines | 80 lines | +700% |
| Testability | Medium | High | ↑↑ |
| Maintainability | Low | High | ↑↑↑ |
Apparent Paradox: REFACTOR code = 5x more lines, BUT 3x more maintainable.
Why?
- Documentation ≠ complexity
- Short functions > long function
- Explicit > implicit
Common Refactoring Patterns
Pattern 1: Extract Function
When to use:
- Function > 50 lines
- Code blocks with comments ”// Step 1: …”
- Reusable logic elsewhere
- Multiple abstraction levels in same function
Example:
# BEFORE (GREEN)
def process_order(order):
# Validate order
if not order.items:
raise ValueError("Order empty")
if order.total < 0:
raise ValueError("Negative total")
# Calculate taxes
subtotal = sum(item.price for item in order.items)
tax_rate = 0.15 if order.province == "QC" else 0.13
tax = subtotal * tax_rate
# Apply discount
discount = 0
if order.customer.is_premium:
discount = subtotal * 0.1
# Charge payment
final_total = subtotal + tax - discount
payment_service.charge(order.customer.card, final_total)
# Send confirmation
email_service.send(order.customer.email, f"Order #{order.id}")
# AFTER (REFACTOR)
def process_order(order):
"""Entry point for order processing."""
_validate_order(order)
subtotal = _calculate_subtotal(order)
tax = _calculate_tax(subtotal, order.province)
discount = _calculate_discount(subtotal, order.customer)
final_total = subtotal + tax - discount
_charge_payment(order.customer, final_total)
_send_confirmation(order)
def _validate_order(order):
"""Validate order before processing."""
if not order.items:
raise ValueError("Order empty")
if order.total < 0:
raise ValueError("Negative total")
def _calculate_subtotal(order):
"""Calculate order subtotal."""
return sum(item.price for item in order.items)
def _calculate_tax(subtotal, province):
"""Calculate tax by province."""
tax_rate = 0.15 if province == "QC" else 0.13
return subtotal * tax_rate
def _calculate_discount(subtotal, customer):
"""Calculate premium customer discount."""
return subtotal * 0.1 if customer.is_premium else 0
def _charge_payment(customer, amount):
"""Charge customer card payment."""
payment_service.charge(customer.card, amount)
def _send_confirmation(order):
"""Send confirmation email."""
email_service.send(
order.customer.email,
f"Order #{order.id} confirmed"
)
Benefits:
- Main function = table of contents (reads like prose)
- Each step = independently testable function
- Cyclomatic complexity reduced (8 → 2)
Pattern 2: Replace Magic Number with Constant
When to use:
- “Magic” numbers without explanation
- Same value repeated in multiple places
- Configurable thresholds/limits
Example:
# BEFORE (GREEN)
def is_valid_password(password):
if len(password) < 8:
return False
if len(password) > 128:
return False
# ... other validations
def check_username(username):
if len(username) < 3:
return False
if len(username) > 20:
return False
# AFTER (REFACTOR)
# Centralized configuration
MIN_PASSWORD_LENGTH: Final[int] = 8
MAX_PASSWORD_LENGTH: Final[int] = 128
MIN_USERNAME_LENGTH: Final[int] = 3
MAX_USERNAME_LENGTH: Final[int] = 20
def is_valid_password(password: str) -> bool:
"""Validate password length."""
return MIN_PASSWORD_LENGTH <= len(password) <= MAX_PASSWORD_LENGTH
def is_valid_username(username: str) -> bool:
"""Validate username length."""
return MIN_USERNAME_LENGTH <= len(username) <= MAX_USERNAME_LENGTH
Benefits:
- Values self-documenting (name = explanation)
- Centrally adjustable (change constant, not 10 places)
- Prevents copy-paste bugs (8 vs 80 typo impossible)
Pattern 3: Introduce Explaining Variable
When to use:
- Complex expression difficult to understand
- Same calculation repeated multiple times
- Nested if conditions obscure
Example:
# BEFORE (GREEN)
if (user.age >= 18 and user.country == "CA" and
user.account_balance > 1000 and not user.is_suspended):
approve_loan()
# AFTER (REFACTOR)
is_adult = user.age >= 18
is_canadian = user.country == "CA"
has_sufficient_funds = user.account_balance > 1000
is_active_account = not user.is_suspended
is_eligible_for_loan = (
is_adult and
is_canadian and
has_sufficient_funds and
is_active_account
)
if is_eligible_for_loan:
approve_loan()
Benefits:
- Clear intent (variable = documentation)
- Debuggable (can inspect each variable)
- Testable (can test criteria separately)
Pattern 4: Consolidate Conditional Expression
When to use:
- Multiple if statements return same result
- Validation logic scattered
- Condition duplication
Example:
# BEFORE (GREEN)
def calculate_shipping(order):
if order.total <= 0:
return 0
if order.items_count == 0:
return 0
if order.customer is None:
return 0
if order.shipping_address is None:
return 0
# Actual shipping calculation...
# AFTER (REFACTOR)
def calculate_shipping(order):
"""Calculate shipping fees."""
if not _is_valid_order(order):
return 0
# Actual shipping calculation...
def _is_valid_order(order):
"""Validate order for shipping calculation."""
return all([
order.total > 0,
order.items_count > 0,
order.customer is not None,
order.shipping_address is not None
])
Benefits:
- Centralized validation logic
- Explicit intent (named function)
- Easy to add/modify validations
Senior Review Checklist
Before approving REFACTOR code, senior verifies:
Architecture & Design:
- Single Responsibility Principle respected (focused functions)
- No code duplication (DRY applied)
- Appropriate design patterns (no over-engineering)
- Relevant abstractions (not premature)
Code Quality:
- Cyclomatic complexity < 10 per function
- Descriptive naming (variables, functions, classes)
- No magic numbers (named constants)
- Robust error handling (no silent failures)
Performance:
- Optimal algorithms (no O(n²) if O(n log n) trivial)
- No obvious memory waste
- Performance benchmarks met
Documentation:
- Complete docstrings (Args, Returns, Raises, Examples)
- Inline comments for complex logic only
- README updated
- No obsolete comments
Tests:
- All tests pass (100%)
- Coverage maintained ≥95%
- Tests refactored if necessary (not fragile)
Logging & Monitoring:
- Appropriate log levels (DEBUG/INFO/WARNING/ERROR)
- Contextual log messages (variable values included)
- No logs with secrets/sensitive data
Security:
- No injection (SQL, commands, XSS)
- Validate external inputs
- No hardcoded secrets
Team Learning:
- Dev can explain refactorings performed
- Dev understands WHY (not just WHAT)
- Constructive feedback given
Phase 5 Pitfalls
Pitfall 1: Over-Refactoring (Over-Engineering)
Problem: Too abstract, loses clarity. Code becomes a maze of design patterns.
Bad example:
# Over-engineered - 200 lines for simple function
class ConfidenceCalculationStrategy(ABC):
@abstractmethod
def calculate(self, context: CalculationContext) -> ConfidenceScore:
pass
class SampleSizePenaltyStrategy(ConfidenceCalculationStrategy):
def __init__(self, penalty_factory: PenaltyFactoryInterface):
self.factory = penalty_factory
def calculate(self, context):
penalty = self.factory.create_penalty(context.sample_size)
return penalty.apply(context.base_confidence)
# 5 classes, 3 interfaces, 200 lines
# Just to calculate a penalty!
Solution: YAGNI rule: “You Aren’t Gonna Need It”
# Just enough - 30 lines, clear
def calculate_confidence(...):
base = weighted / total
penalty = _calculate_penalty(n, top_k)
return base * penalty
def _calculate_penalty(n, top_k):
return min(n / top_k, 1.0)
Golden rule: If a reader doesn’t understand after 30 seconds, it’s too abstract.
Pitfall 2: Breaking Tests During Refactor
Problem: Refactoring changes behavior instead of structure. Tests fail.
Why dangerous:
GREEN: Tests 100% pass
REFACTOR: Accidentally changes logic
Tests fail
vs
GREEN: Tests 100% pass
REFACTOR: Changes structure only
Tests pass: Refactor safe ✓
Solution: Execute tests after EACH mini-refactor (not at end)
1. Extract _calculate_penalty() function
→ Run tests → ✓ Pass
2. Add MIN_SAMPLE constant
→ Run tests → ✓ Pass
3. Rename confidence_raw → base_confidence
→ Run tests → ✓ Pass
4. Add logging
→ Run tests → ✓ Pass
If test fails, you KNOW EXACTLY which refactor broke it (the last one).
Rule: Refactor = behavior-preserving transformation. Tests must ALWAYS pass.
Pitfall 3: Premature Abstraction
Problem: “I’ll create this interface now, in case we need it later.”
Example:
# Interface created "just in case"
class PaymentProcessorInterface(ABC):
@abstractmethod
def process_payment(self, amount): pass
class StripePaymentProcessor(PaymentProcessorInterface):
def process_payment(self, amount):
# Stripe implementation
# But we'll NEVER use another payment processor!
# Useless interface, gratuitous complexity
Solution: Rule of 3: Create abstraction AFTER 3 use cases, not before.
Use case 1: Concrete Stripe code
Use case 2: Concrete PayPal code → Ah, similar pattern
Use case 3: Concrete Square code → OK, abstract now
→ 3 cases = real pattern confirmed, abstraction justified
Rule: “Duplication > bad abstraction” (Sandi Metz)
Pitfall 4: Losing Business Logic in Refactor
Problem: Code becomes elegant but obscures WHAT IT DOES business-wise.
Example:
# BEFORE GREEN - Clear but verbose
def calculate_price(item):
base_price = item.price
if item.category == "food":
tax = base_price * 0.05 # HST 5%
else:
tax = base_price * 0.15 # HST + PST 15%
if customer.is_premium:
discount = base_price * 0.1 # 10% discount
else:
discount = 0
return base_price + tax - discount
# AFTER REFACTOR - Elegant but obscure
def calculate_price(item):
return pipe(
item.price,
apply_tax_strategy(item.category),
apply_discount_policy(customer.tier),
round_to_cents
)
# Business logic is hidden!
Solution: Keep business logic VISIBLE with descriptive names.
# GOOD REFACTOR - Elegant AND clear
def calculate_price(item):
"""Calculate price with Canadian taxes and premium discount."""
base_price = item.price
tax = _calculate_canadian_tax(base_price, item.category)
discount = _calculate_premium_discount(base_price, customer)
return base_price + tax - discount
def _calculate_canadian_tax(price, category):
"""HST 5% (food) or HST+PST 15% (other)."""
return price * (0.05 if category == "food" else 0.15)
def _calculate_premium_discount(price, customer):
"""10% discount for premium customers."""
return price * 0.1 if customer.is_premium else 0
Rule: Business logic must be readable by the Product Owner, not just devs.
Pitfall 5: Refactoring Paralysis (Infinite Perfection)
Problem: “I could still extract this sub-function… and add this pattern…” → 3 days of refactor, never shipped.
Solution: Strict timeboxing + Definition of Done.
Phase 5 allocation: 12h max
- Identification: 2h
- Execution: 6h
- Review: 2h
- Buffer: 2h
If 12h elapsed and DoD met → STOP, ship
If 12h elapsed and DoD not met → Escalate, don't continue
Rule: “Better done than perfect.”
Production code ≠ perfect code Production code = maintainable code that works
80% quality in 12h >> 95% quality in 40h
Next step: Phase 6: Triple Inspection (Optional) →
Need help? Consult the Roles and Responsibilities document to clarify who does what in this phase.