Skip to main content
Developer Productivity Metrics: What to Measure and How to Improve Them

Developer Productivity Metrics: What to Measure and How to Improve Them

Story points and PR counts measure activity, not outcomes. Here's how to use DORA and SPACE metrics to understand what's actually slowing your team down and what to fix first.

Riya Mittal By Riya Mittal
Published: April 9, 2026 7 min read

Most engineering teams are measuring the wrong things. Story points completed, lines of code written, pull requests merged per week. These numbers are easy to collect and easy to present in a dashboard. They are also almost entirely useless for understanding whether your team is building software effectively.

The problem is not that metrics are bad. The problem is that the wrong metrics drive the wrong behavior. When you measure lines of code, developers tend to write more verbose code. When you measure story points, estimates inflate. When you measure PR count, engineers split work into tiny, meaningless chunks. You get the behavior you measure, and most teams are measuring activity instead of outcomes.

This is how to fix that.


The Metrics That Don’t Work

 

Before covering what to measure, it helps to understand why common metrics fail.

MetricWhat it measuresWhat it missesBehavior it drives
Story points per sprintEstimation accuracy over timeActual delivery speed, qualityPoint inflation, padding estimates
Lines of codeVolume of outputValue, complexity, maintainabilityVerbose, unrefactored code
PRs merged per weekActivity levelPR size, review quality, rework rateSplitting work into micro-PRs
Tickets closedTask completionCustomer value deliveredClosing tickets without fixing problems
Build success rateCI stabilityWhether CI tests anything meaningfulGreen CI with no coverage

Each of these measures something real. None of them tells you whether your team is shipping value quickly and reliably. For that, you need outcome metrics.


DORA: The Baseline Every Team Should Have

 

The DORA (DevOps Research and Assessment) program at Google has tracked engineering team performance since 2014. Their research is the most rigorous study of software delivery performance in existence, covering 36,000+ professionals across thousands of teams.

DORA identified four metrics that consistently predict software delivery performance and organizational outcomes:

Deployment Frequency: How often does your team deploy to production? Elite teams deploy multiple times per day. Low performers deploy once per month or less.

Lead Time for Changes: How long from code committed to code running in production? Elite teams: under one hour. Low performers: one to six months.

Change Failure Rate: What percentage of deployments cause a production incident requiring a hotfix or rollback? Elite teams: 0-15%. Low performers: 46-60%.

Time to Restore Service: When an incident occurs, how long to recover? Elite teams: under one hour. Low performers: one week to one month.

DORA metrics mapped across the delivery pipeline from code committed to service restored.

Figure: DORA metrics mapped across the delivery pipeline from code committed to service restored

MetricEliteHighMediumLow
Deployment FrequencyMultiple/dayOnce/day to once/weekOnce/week to once/monthFewer than once/month
Lead Time for ChangesUnder 1 hour1 day to 1 week1 week to 1 month1 to 6 months
Change Failure Rate0-15%16-30%16-30%46-60%
Time to RestoreUnder 1 hourUnder 1 day1 day to 1 week1 week to 1 month

DORA metrics work because they measure outcomes that engineers and business stakeholders both care about. Deployment frequency is a proxy for batch size: teams that deploy frequently ship smaller changes, which are easier to review, easier to test, and easier to roll back. This is why elite performers have lower change failure rates. It is not because they are more careful. It is because smaller changes contain fewer surprises.

Start by baselining these four metrics. You do not need special tooling. You need deployment timestamps from your CI/CD system and incident timestamps from your alerting tool. A spreadsheet works for the first 90 days.


The SPACE Framework: Beyond Throughput

 

DORA tells you how fast and reliably your team ships. It does not tell you whether your developers are burned out, blocked, or unhappy. A team can have excellent DORA metrics while being miserable, which is not sustainable.

The SPACE framework, developed by researchers at Microsoft, GitHub, and the University of Victoria, adds four more dimensions:

SPACE framework five dimensions of developer productivity beyond throughput.

Figure: SPACE framework five dimensions of developer productivity beyond throughput

Satisfaction: Developer Net Promoter Score, retention rate, satisfaction survey results. Teams where developers score satisfaction below 6/10 see 3x higher attrition within 12 months.

Performance: Feature adoption rates, defect escape rate, system reliability. This measures whether the work actually worked.

Activity: Commits, PRs, deployments. Activity metrics are valid inputs, not outputs. Use them to spot anomalies, not to rank engineers.

Communication: PR review turnaround time, async communication quality, cross-team coordination overhead.

Efficiency: Time in flow (uninterrupted work sessions over 2 hours), context switches per day, environment setup and rebuild times.

The key insight from SPACE research: optimizing a single dimension degrades others. Teams that maximize Activity without measuring Efficiency burn out. Teams that optimize Satisfaction without measuring Performance drift into comfortable stagnation. Measure all five dimensions and look for imbalances.


Where Time Actually Goes

 

GitHub’s Octoverse 2023 report found that developers spend only 32% of their time writing code. The remaining 68% goes to activities that feel productive but do not directly produce software.

Here is where that time typically disappears:

CategoryAvg hours/week lostRoot causeFix
PR review wait4.2 hoursNo SLO, no reviewer assignmentReviewer rotation + 24-hour SLO
Broken dev environments3.1 hoursShared infra, no isolationPer-developer ephemeral environments
Meetings without decisions2.8 hoursNo async culture, poor agendasDefault async, meetings only for decisions
CI pipeline slowness2.4 hoursNo caching, sequential jobsParallelized CI, cache warming
Context-switching between tools1.9 hoursFragmented toolchainUnified developer portal
Onboarding / documentation gaps1.6 hoursUndocumented systemsService catalog with runbooks

The most expensive item is not meetings. It is a broken development environment, because the cost is invisible. When an engineer spends 90 minutes diagnosing whether a bug is in their code or the shared staging environment, that time does not show up in any metric; it looks like slow delivery.

Non-production environment reliability is a direct productivity input. When staging is flaky or unavailable, developers cannot validate their changes. They either ship with lower confidence (raising the chance of failure rate) or wait (increasing lead time). Both outcomes degrade DORA metrics. Fixing environment reliability is frequently the highest-leverage productivity investment a platform team can make.


How to Actually Move These Numbers

 

Knowing what to measure is not enough. Here is what actually moves developer productivity metrics, with specific before/after outcomes from teams that implemented these changes.

PR Review SLO. Set a team agreement that all PRs receive a first review within 24 hours. LinkedIn’s engineering team reduced PR review cycle time from 3.2 days to 18 hours using automated reviewer assignment and a visible queue dashboard. Their deployment frequency increased 60% within two months. The review did not get faster because engineers became more diligent. It got faster because the queue was visible and the expectation was explicit.

CI Pipeline Speed. Every minute your CI pipeline takes is a minute a developer waits, then context-switches. Pipelines over 10 minutes reliably cause engineers to switch tasks and not return with full focus. Audit your pipeline for sequential jobs that can run in parallel, missing cache layers, and test suites that have not been pruned in over 6 months. Most teams can cut CI time by 40% without changing what is being tested.

Environment Provisioning Time. Puppet’s State of DevOps 2023 found that companies with mature internal developer platforms reduced environment setup time from 4.2 days to 2.1 days for new engineers. The mechanism is pre-configured, self-service environments accessible through a developer portal. New engineers should be able to run the full system locally within 2 hours of their first day.

Deployment Frequency. If your team deploys less than once per week, the path to improvement is not process improvement. It is technical: feature flags, trunk-based development, and automated rollback. These remove the fear that slows down deployments. When rollback takes 3 minutes instead of 3 hours, shipping more frequently becomes rational.

Cognitive Load Reduction. Count the number of tools a developer must open to ship a single feature from local development to production. If that number exceeds five, you have a toolchain consolidation problem. Every additional tool is a context switch waiting to happen. Platform teams that build golden paths that work end-to-end see 40% reduction in the time from code complete to production deploy.


Start With One Metric

 

The most common mistake when implementing developer productivity measurement is trying to measure everything at once. You end up with a dashboard nobody trusts, metrics that contradict each other, and engineers who feel surveilled rather than supported.

Start with the lead time for changes. It is the most direct measure of how your delivery pipeline is performing. It is easy to explain to engineers and to leadership. It has a clear causal chain: long lead times come from slow CI, slow review, infrequent deployments, or manual steps. Each of those causes has a fix.

Measure your current lead time. If it is over two weeks, fix your CI pipeline first. If it is one to two weeks, fix your review process. If it is under a week but you still feel slow, look at deployment automation and batch size.

Once lead time improves, add deployment frequency and change failure rate. These three together give you a complete picture of delivery health. Add SPACE dimensions quarterly to catch burnout and satisfaction problems before they become attrition problems.

Productivity measurement works when engineers trust that the data is being used to remove obstacles, not to evaluate individuals. Be explicit about that intent, share the metrics publicly with the team, and act on what they reveal. The metrics are only as useful as the interventions they drive.

Riya Mittal

Written by

Riya Mittal Author

Engineer at Zop.Dev

Tagged in

ZopDev Resources

Stay in the loop

Get the latest articles, ebooks, and guides
delivered to your inbox. No spam, unsubscribe anytime.