Goodhart's Law: Why Your Metrics are Ruining Your Codebase

tldr;

Engineering is a creative discipline disguised as a technical one. When you treat it like a factory line by targeting raw output units, you get low-quality parts.

Goodhart’s Law states: “When a measure becomes a target, it ceases to be a good measure.”

In software engineering, this manifests as a divergence between “looking productive” and “being productive.” The transition from using a metric for observation to using it for evaluation is where the system breaks.

This post was inspired by ThePrimeAgen’s breakdown on how metrics-based management often destroys engineering culture.

The Perverse Incentive

Metrics are intended to be proxies for health. Lines of code (LoC), PR velocity, and test coverage are easy to count, but they are not the goal. The goal is delivering value and maintaining a sustainable system.

As soon as a manager announces that PR count is the primary KPI for the quarter, the engineering team will respond rationally. They will split single, cohesive changes into five smaller PRs. The metric goes up; the signal is lost.

Common Failures

Test Coverage: Requiring 90% coverage results in “assertionless tests”—tests that execute code paths to satisfy the runner but don’t actually verify correctness.
Story Points/Velocity: Teams begin to inflate estimates. A “3-point” task becomes an “8-point” task to ensure the velocity chart looks “up and to the right.”
Lines of Code: This rewards verbose, copy-pasted solutions over concise, abstracted logic.

Gaming the System

If you measure engineers by raw activity, you get scripts that generate activity. Below is a simple Bash script that “increases productivity” by padding commits—a literal implementation of gaming a metric.

#!/bin/bash
# A "Productivity" Script for metric-obsessed managers

for i in {1..10}
do
   echo "// Update: $(date +%s)" >> activity_log.txt
   git add activity_log.txt
   git commit -m "chore: minor refactor and optimization $i"
done

git push origin main

This script produces 10 commits and modifies 10 lines. On a dashboard, this engineer looks 10x more active than the developer who spent three days deleting 500 lines of technical debt.

Metrics as Observation, Not Targets

The solution is not to stop measuring. Data is necessary for identifying bottlenecks. The solution is to decouple the metric from the incentive.

1. Observe the Delta

Use metrics to identify outliers. If PR cycle time suddenly spikes, don’t penalize the team. Investigate the cause. Is the CI/CD pipeline slow? Is the requirements-gathering phase broken?

2. Measure the “Un-gamable”

Focus on outcomes rather than activities.

Change Failure Rate: How often do deployments break?
Mean Time to Recovery (MTTR): How fast can we fix a break?
Lead Time for Changes: How long does it take to go from code-complete to production?

These are DORA metrics. They are harder to game because they require the system to actually function correctly to improve.

3. Use Counter-Metrics

If you target velocity, you must also target quality. If velocity goes up but the bug count also rises, you haven’t actually improved.

Primary Metric	Counter-Metric
Velocity (Story Points)	Bug Count / Defect Density
Test Coverage	Mutation Testing Score
Deployment Frequency	Change Failure Rate

Congrats, you scrolled to the bottom!

Engineering is a creative discipline disguised as a technical one. When you treat it like a factory line by targeting raw output units, you get low-quality parts.

Watch ThePrimeAgen’s video for a deeper dive into why “impact” is almost never found in a Jira dashboard. Use metrics to find the problem, not to judge the person.