Skip to main content
Glossary/Change Failure Rate

Change Failure Rate
DORA Metric Explained

What percentage of your deployments break production?

Change failure rate measures how often a deployment requires remediation — a rollback, a fix-forward, or an emergency patch. One of the four DORA key metrics, it is the stability side of the software delivery picture.

Definition

What change failure rate measures

Change failure rate is calculated as: (number of deployments that caused a production degradation) divided by (total number of deployments in the measurement period). A CFR of 10% means one in ten deployments required a remediation action.

The DORA definition is precise about what counts as a failure: a degradation requiring remediation. That includes rollbacks, hotfixes deployed within hours of the original change, and changes that triggered incidents. It does not include planned deployments that temporarily reduce capacity as part of a scheduled maintenance window. The intent is to measure unintended failures, not planned operational events.

CFR is meaningful only when combined with other DORA metrics. A team that deploys once per month and has a 5% CFR is failing every 20 months. A team that deploys 50 times per day with 5% CFR is failing 2.5 times per day. The absolute CFR percentage and the deployment frequency together determine the rate of production incidents.

DORA benchmarks

Performance tiers

Source: DORA State of DevOps research program, dora.dev

Elite
5% or less
High
5% to 10%
Medium
10% to 15%
Low
Greater than 15%

Speed and stability

Elite performers deploy often and break production less

The intuitive assumption is that teams deploying more often should have higher CFR — more deployments means more opportunities for failure. The DORA data consistently contradicts this assumption at the elite tier.

Elite performers deploy frequently (multiple times per day) AND have the lowest change failure rates (5% or less). The mechanism is batch size: elite performers ship small changes, which means each individual deployment has a small surface area of what can go wrong. When something does fail, the deployment is easy to roll back and the failing change is easy to identify.

The path to low CFR is not deploying less often. It is deploying smaller changes more often, with better testing and observability supporting each deployment. The trade-off between speed and stability exists only for teams that have not yet built the platform capabilities to eliminate it.

Common causes

Four factors that drive high change failure rate

Most are platform and process problems, not individual performance problems.

Insufficient test coverage

Missing integration tests are the most common CFR driver. Unit tests verify individual functions; integration tests verify that the assembled system behaves correctly. A system with high unit test coverage but no integration tests will regularly ship changes that break behavior at the boundaries between components.

Monolithic deployments

Large deployments that bundle weeks of changes into a single production push are higher risk per deploy because the surface area of what can go wrong is large. When the deployment fails, identifying which of the 200 commits caused the issue requires time the on-call rotation often does not have.

No feature flags

Without feature flags, every merged change is immediately visible to users on the next deployment. A change that behaves correctly in staging but fails under production load or with production data has no isolation mechanism. Feature flags allow progressive exposure — start with 1% of users, monitor, expand.

Poor observability

A deployment that causes a 2% increase in error rate is a CFR event — but only if someone notices it. Without alerting configured around deployment windows and baseline metrics, CFR events are underreported. Teams that measure CFR accurately tend to have better observability than teams that report low CFR and are surprised by production incidents.

Common questions

Change failure rate: direct answers

What is a good change failure rate?

The DORA research (dora.dev) defines four performance tiers. Elite and high performers have a change failure rate of 5% or less. Medium performers fall between 5% and 15%. Low performers exceed 15%. These are not absolute pass/fail thresholds — a team improving from 20% to 10% has made meaningful progress even if it has not yet reached the elite tier.

Is change failure rate the same as error rate?

No. Change failure rate measures deployment-triggered degradations — problems that begin immediately after a deployment and require a remediation action. Error rate measures the frequency of errors in production at any given time, regardless of cause. A service can have a low error rate most of the time and a high change failure rate if its deployments frequently introduce new errors.

How do you reduce change failure rate?

Reducing CFR requires addressing the specific causes in your context. Common interventions: improving integration test coverage, reducing deployment batch size, adding feature flags for progressive exposure, improving pre-production environment parity, and adding deployment observability so that failures are detected within minutes.

Reduce your change failure rate

The Reliability Program wires the testing, observability, and deployment infrastructure that drives CFR down.

SLO framework. Error budget dashboards. Postmortem process. DORA baseline and improvement roadmap.