tldr;
Engineering is a creative discipline disguised as a technical one. When you treat it like a factory line by targeting raw output units, you get low-quality parts.
Goodhart’s Law states: “When a measure becomes a target, it ceases to be a good measure.”
In software engineering, this manifests as a divergence between “looking productive” and “being productive.” The transition from using a metric for observation to using it for evaluation is where the system breaks.
This post was inspired by ThePrimeAgen’s breakdown on how metrics-based management often destroys engineering culture.
The Perverse Incentive
Metrics are intended to be proxies for health. Lines of code (LoC), PR velocity, and test coverage are easy to count, but they are not the goal. The goal is delivering value and maintaining a sustainable system.
As soon as a manager announces that PR count is the primary KPI for the quarter, the engineering team will respond rationally. They will split single, cohesive changes into five smaller PRs. The metric goes up; the signal is lost.
Common Failures
- Test Coverage: Requiring 90% coverage results in “assertionless tests”—tests that execute code paths to satisfy the runner but don’t actually verify correctness.
- Story Points/Velocity: Teams begin to inflate estimates. A “3-point” task becomes an “8-point” task to ensure the velocity chart looks “up and to the right.”
- Lines of Code: This rewards verbose, copy-pasted solutions over concise, abstracted logic.
Gaming the System
If you measure engineers by raw activity, you get scripts that generate activity. Below is a simple Bash script that “increases productivity” by padding commits—a literal implementation of gaming a metric.
#!/bin/bash
# A "Productivity" Script for metric-obsessed managers
for i in {1..10}
do
echo "// Update: $(date +%s)" >> activity_log.txt
git add activity_log.txt
git commit -m "chore: minor refactor and optimization $i"
done
git push origin main
This script produces 10 commits and modifies 10 lines. On a dashboard, this engineer looks 10x more active than the developer who spent three days deleting 500 lines of technical debt.
Metrics as Observation, Not Targets
The solution is not to stop measuring. Data is necessary for identifying bottlenecks. The solution is to decouple the metric from the incentive.
1. Observe the Delta
Use metrics to identify outliers. If PR cycle time suddenly spikes, don’t penalize the team. Investigate the cause. Is the CI/CD pipeline slow? Is the requirements-gathering phase broken?
2. Measure the “Un-gamable”
Focus on outcomes rather than activities.
- Change Failure Rate: How often do deployments break?
- Mean Time to Recovery (MTTR): How fast can we fix a break?
- Lead Time for Changes: How long does it take to go from code-complete to production?
These are DORA metrics. They are harder to game because they require the system to actually function correctly to improve.
3. Use Counter-Metrics
If you target velocity, you must also target quality. If velocity goes up but the bug count also rises, you haven’t actually improved.
| Primary Metric | Counter-Metric |
|---|---|
| Velocity (Story Points) | Bug Count / Defect Density |
| Test Coverage | Mutation Testing Score |
| Deployment Frequency | Change Failure Rate |
Congrats, you scrolled to the bottom!
Engineering is a creative discipline disguised as a technical one. When you treat it like a factory line by targeting raw output units, you get low-quality parts.
Watch ThePrimeAgen’s video for a deeper dive into why “impact” is almost never found in a Jira dashboard. Use metrics to find the problem, not to judge the person.