DORA Metrics for Engineering Leaders: What They Actually Tell You
TL;DR. DORA's four metrics — deployment frequency, lead time for changes, change failure rate, and mean time to restore — are diagnostic tools, not scorecards. Each one points to a specific category of problem in your delivery system. Used well, they tell you where to invest. Used badly, they tell you which number a team is optimizing for without changing anything real.
Most engineering leaders have seen a DORA dashboard. Fewer understand what the numbers mean in context, and fewer still understand why the platform shapes the metrics more than the individual teams do.
A brief recap of the four metrics
The DORA research is conducted annually by the DevOps Research and Assessment team, now part of Google. The research consistently identifies four metrics that predict software delivery performance and organizational outcomes. The canonical source is dora.dev and the annual DORA State of DevOps Report.
Deployment frequency. How often does an organization successfully release to production?
Lead time for changes. How long does it take from a code commit to that code running in production?
Change failure rate. What percentage of changes to production cause a degraded service, requiring a hotfix or rollback?
Mean time to restore (MTTR). How long does it take to recover service after an incident?
The DORA research classifies teams into performance tiers based on these four metrics. Elite performers deploy on demand (multiple times per day), have lead times under one hour, change failure rates of 0–5%, and MTTR under one hour. Low performers deploy less frequently than once per six months, have lead times of one to six months, change failure rates of 46–60%, and MTTR between one week and one month. The gap between the tiers is large and consistent across years of research.
What each metric signals at the executive level
The metrics are often discussed at the engineering level, where they map to specific technical practices. At the executive level, each metric points to a different business concern.
Deployment frequency: are we batching risk or distributing it?
High deployment frequency means small, focused changes going to production regularly. Small changes are easier to test, easier to review, easier to roll back when something goes wrong, and carry a smaller blast radius when they do. Low deployment frequency means large batches accumulating, which means each release carries more risk and more uncertainty about what changed.
For an engineering leader, a low deployment frequency number is not evidence that the team is lazy or slow. It is evidence that something in the process — architectural coupling, manual approval gates, long test suites, fragile deployment tooling — is making it expensive or risky to deploy more often. That friction has a cost whether or not it is measured.
Lead time for changes: how long from idea to user?
Lead time measures the cycle from a developer committing code to that code being used in production. A long lead time means feedback from users arrives late, bugs sit in the pipeline longer before being caught, and the organization's ability to respond to market signals is constrained.
Long lead time signals process friction or architecture friction — not engineer slowness. The most common causes are: manual handoffs in the deployment pipeline, insufficient test coverage that forces manual QA, architectural coupling that requires coordinated deployments across multiple services, or approval processes that queue changes before they move.
Change failure rate: how predictable is our delivery?
A high change failure rate means a significant fraction of production deployments cause incidents. This is not primarily a development quality problem — it is a delivery infrastructure problem. Teams with good test coverage and fast automated pipelines can catch regressions before they reach production. Teams deploying through unreliable or poorly instrumented pipelines produce high failure rates regardless of code quality.
High change failure rate combined with low deployment frequency is a particularly damaging pattern: infrequent large deployments that often cause incidents. The solution is rarely to slow down further. It is to improve deployment infrastructure so that smaller, safer changes can go out more frequently.
MTTR: how quickly do you recover when something breaks?
MTTR measures not just incident response speed but observability quality and ownership clarity. A team that takes four hours to restore service on a simple database connection failure probably has one of two problems: the observability tooling is not pointing clearly to the cause, or the ownership model is unclear about who acts when the alert fires.
High MTTR is a signal that the organization has not invested in making failure visible and response fast. The cost shows up in customer impact, in engineer fatigue during long incidents, and in the anxiety tax that discourages teams from deploying.
The "elite performer" trap
The DORA performance tiers create a benchmark that is genuinely useful as a directional reference. They also create a failure mode: treating the benchmarks as targets to hit rather than as diagnostic anchors.
Teams that optimize for the DORA number rather than the underlying behavior can produce results that look good on a dashboard while the actual delivery system degrades. Deployment frequency goes up because the team is merging trivial commits rather than meaningful changes. Lead time looks short because the measurement window was redefined. Change failure rate looks low because the definition of "failure" was narrowed.
None of these are hypothetical. They happen in organizations that use the metrics as KPIs rather than as diagnostics. The protection against it is to measure the input behaviors alongside the output metrics: are deployments actually getting smaller and more frequent, or did the counting methodology change? Is lead time shortening because the pipeline got faster, or because the measurement starts later in the process?
The elite performance benchmarks are a compass bearing, not a finish line. The question to ask when the numbers look wrong is: what behavior would I need to change to move this metric for the right reasons?
Why the platform shapes the numbers more than the teams do
Here is the part of the DORA story that gets underemphasized in most dashboard conversations: your DORA numbers primarily reflect the quality of your deployment infrastructure, not the quality of your engineering teams.
A team that wants to deploy more frequently cannot do so if the deployment pipeline takes 45 minutes, requires manual approvals, or has a high failure rate that erodes confidence. The deployment frequency number is bounded by what the platform makes safe and fast.
A team that wants to improve its MTTR cannot do so if the observability tooling is not pointing at the right service, or if alerts are firing on symptoms rather than causes, or if there are no runbooks to guide response. The MTTR number reflects observability and on-call infrastructure, not team response time.
This matters for how engineering leaders interpret the data. When change failure rate is high across multiple teams, the first question is not which teams have weak testing practices. It is: what do these teams share? The shared thing is almost always the deployment infrastructure, the testing pipeline, or the configuration management approach.
The corollary for investment: improving DORA metrics without improving the platform is like trying to improve marathon times without addressing the course conditions. Teams can work harder within a bad system, but the leverage is in fixing the system.
What to do with the numbers
Use the DORA metrics to identify the constraint in your delivery system, not to rank teams.
When one metric is significantly worse than the others, it points to a specific constraint category. High lead time with acceptable change failure rate: pipeline automation is the bottleneck. High change failure rate with adequate lead time: testing coverage or deployment confidence is the issue. High MTTR with good deployment metrics: observability or incident ownership is the gap.
When all metrics are poor across multiple teams, the platform is the constraint. Investment in deployment tooling, observability defaults, and golden paths will move all four metrics across all teams simultaneously — which is the leverage that makes platform investment worth it.
The right cadence for the executive audience is quarterly. Monthly fluctuations in these metrics are normal and not actionable. Quarterly trends reveal whether the underlying practices are improving, stable, or degrading.
For a structured assessment of where your organization sits across these dimensions and what the highest-leverage investment would be, the Foundations Assessment produces a baseline measurement before any platform investment decisions are made.
Related reading: the deployment frequency glossary page and change failure rate glossary page for precise definitions.

Mat Caniglia
LinkedInFounder of Clouditive. 18+ years transforming engineering organizations across LATAM and globally through Developer Experience consulting.
79 articles published