The Platform Engineering Mistake That Cost $2M and Two Years

TL;DR. A technically solid internal developer platform. Eighteen months of build time. Forty percent of engineering teams quietly stopped using it six months after launch. The failure was not technical — the foundation was genuinely good. The failure was the theory of adoption: building for architectural correctness rather than for the first experience of a developer with a deadline. The recovery required embedding platform engineers with application teams to observe actual workflow, introducing golden paths with explicit support commitments, and measuring developer trust rather than features shipped. Trust in infrastructure tools is fragile and slow to rebuild. The sequence that rebuilds it is observable and repeatable.

The director of engineering told me about it almost apologetically. They had built an internal developer platform. It had taken 18 months, cost around $2 million when you accounted for the team's time, and they had shipped it with a company-wide announcement and genuine excitement.

Six months later, roughly 40 percent of the engineering teams had quietly stopped using it.

"The worst part," he said, "is that the platform works. It's technically solid. Engineers just do not trust it."

This story is more common than the platform engineering community acknowledges. The discipline has matured significantly in terms of technical capability. The organizational and product management skills that make platforms successful are still being figured out.

Why technically superior internal platforms fail to get adopted

The platform team had made the classic mistake of building for their own definition of what developers needed rather than for what developers were actually experiencing. They had spent the first year deep in infrastructure decisions, Kubernetes configuration, service mesh design, secrets management architecture, and CI abstractions, with minimal contact with the application teams they were supposed to be serving.

When the platform launched, it was technically impressive and practically disconnected from the workflow engineers already had. It required learning new concepts to do things they could already do, even if their current way was messier. Some of the migration paths from the old tooling to the new platform were poorly documented. The initial rollout had bugs that took weeks to fix, and by the time those bugs were addressed, the early adopters who had tried it and hit problems had already formed opinions that spread through the engineering organization.

Trust in infrastructure tools is extremely fragile and extremely slow to rebuild. A developer who hits an unexplained failure during a deploy on their first day with a new tool will wait a long time before trying it again. If they had to escalate to a platform engineer to resolve the failure, the credibility cost is even higher. The platform team sees a resolved incident. The application engineer sees a tool that broke on its first use.

The failure was not technical. The technical foundation they built was genuinely solid. The failure was in the theory of adoption: the belief that if you build something technically superior, adoption will follow. In a competitive consumer market, superior products win through distribution and marketing. In an internal engineering context, superior tools win through trust, documentation, and the quality of the first experience.

The gap between an infrastructure team mindset and a product team mindset

Platform engineering is a discipline that borrows heavily from product management, and the teams that succeed at it genuinely operate like product teams. They have a user research function that includes regular conversations with the developers who will use the platform. They prioritize based on actual developer pain rather than technical elegance. They maintain a public roadmap that allows application teams to plan around platform capabilities. They respond to feedback quickly, treating developer complaints as support tickets that deserve the same urgency as customer-facing incidents. And they measure success by developer adoption and satisfaction rather than by features shipped or infrastructure deployed.

The team that built this $2 million platform operated like a traditional infrastructure team. They built what they thought was architecturally correct. They shipped it. They expected adoption because the alternative was technically inferior.

The gap between "technically correct" and "what developers will actually use" is where most platform projects fail. An engineer with a deadline who hits friction with the new platform will revert to the old way of doing things. That reversion is not laziness or resistance to change. It is a rational response to a tool that makes their immediate job harder, even if it would make their job easier in three months once they had learned it fully. The platform team sees this as a training and change management problem. The application engineer sees it as a platform problem.

Why documentation does not fix cognitive load — design does

One of the most consistent patterns in failed platform adoption is what I call the cognitive load trap. Platform teams, who understand their system deeply, tend to underestimate how much a new user needs to learn before the platform becomes useful. The features that the platform team considers straightforward require application engineers to build a mental model of the platform architecture before they can be used effectively.

The 2025 research on developer experience is clear on this point. The platforms that achieve high adoption are the ones that minimize the cognitive load required for common tasks. An application engineer who needs to deploy a new service should not need to understand the underlying Kubernetes configuration to do it. An engineer who needs to provision a test environment should not need to understand the infrastructure as code framework. The platform should abstract these concerns and provide a workflow that is simpler than the alternative, not one that requires additional learning before becoming comparable.

This is a design challenge, not a documentation challenge. The impulse of most platform teams when confronted with adoption problems is to improve the documentation. Documentation helps engineers who are already trying to learn the platform. It does not help engineers who stopped trying because the initial experience was too difficult. The design of the first experience matters more than the quality of the documentation for users who never get past the first experience.

What the recovery from 40% abandonment actually looked like step by step

The platform team pivoted to a product model. They embedded one platform engineer with each of the three largest application teams for a month, not to sell the platform, but to observe what was actually painful in their daily workflow. The findings were specific and sometimes embarrassing: a confusing error message that had been logged as a bug for eight months and never prioritized, a CLI command that required three separate authentication steps that nobody had considered simplifying because the platform engineers had scripted around it locally, documentation that described an old version of the tool that was no longer deployed.

They introduced a concept they called the golden path: an opinionated, well-supported, well-documented way to do the five most common things every team needed to do. New service deployment. Database migration. Environment provisioning. Secret management. Log and metric access.

They were not mandating that everyone use only this path. They were guaranteeing that if you used it, it would work, it would be documented with working examples, and when something broke, there would be a platform engineer available to help within the same business day. The golden path was not the only way to use the platform. It was the way that came with a support contract.

The adoption curve reversed over the following six months. Once enough teams had positive experiences with the golden path, social proof did the rest. Engineers started recommending it to each other instead of warning each other away. The platform team's credibility recovered as engineers began to associate the platform with reliability rather than with the earlier frustrating experiences.

What to measure once adoption is climbing — the metrics that actually tell you why

A year after the pivot, platform adoption was at 87 percent. More importantly, the teams using the platform reported a 40 percent reduction in the time they spent on infrastructure-related tasks each week. That is not a platform metric. That is a developer experience metric. And it is the metric that should have been the goal from day one.

The platform team learned to track several leading indicators of adoption health that they had not measured before. Developer satisfaction with platform documentation, measured quarterly through a simple survey. Time to first successful deployment for new teams adopting the platform, which is the metric most sensitive to the first-experience quality. The ratio of support requests to active users, which measures how much burden the platform places on application engineers when things do not work as expected. And the net promoter score among application teams, which captures whether platform users are recommending it to colleagues or warning them away.

These metrics are more actionable than adoption rate alone because they tell the platform team where to focus. Low documentation satisfaction points to specific gaps in the developer education content. Long time to first deployment points to friction in the onboarding experience. High support request ratio points to confusing error handling or insufficient inline guidance. Low net promoter score despite high adoption indicates that engineers are using the platform because they have to, not because they want to, which is a warning sign about what happens if adoption becomes optional.

Measuring trust, not features: what success looks like in internal platform work

The $2 million and the two years were not wasted. The technical foundation built during those years was genuinely good, and the platform that emerged after the product pivot was built on top of it. But the lesson is that the best internal platform in the world is worthless if developers do not trust it enough to use it consistently.

The engineering leaders who get the best outcomes from platform investments are the ones who define success in terms of developer outcomes from the beginning. Not "we shipped the platform." But "the teams using the platform spend 40 percent less time on infrastructure tasks." Not "the platform supports 50 integrations." But "90 percent of teams are using the platform for their most common workflows without needing support."

This framing changes the investment priorities. It means the first six months of a platform project should include significant time with application teams understanding their actual workflow, not just building infrastructure. It means the definition of done for any platform feature includes working documentation and a positive first-experience for a developer who has not seen it before. It means the platform team has an on-call rotation for developer support, not just infrastructure incidents.

Build for adoption, not for architecture. Measure trust, not features. Treat your developers as customers who have other options, because they do. In internal platform work, the other option is always "do it the old way," and that option will be taken every time the platform provides a worse experience.

What a platform team SLA changes — and why most teams do not have one

One of the most impactful changes a platform team can make is to formalize a service-level agreement with application teams: specific commitments about the reliability of the platform, the responsiveness of support, and the cadence at which feedback is addressed.

Most platform teams do not have an SLA. They respond to developer issues when they can, prioritize platform improvements based on their own judgment, and provide no formal commitment on reliability. Application teams, lacking a formal commitment, learn through experience what the platform's actual reliability and responsiveness is. That learned experience shapes whether they adopt the platform for new work or find alternative approaches.

A platform team SLA does not need to be legally binding or elaborate. It needs to be specific enough to be verifiable and committed to seriously enough to be honored. Something like: "Critical platform issues will be acknowledged within two hours and resolved within eight. Developer feedback will be triaged within three business days. Monthly reliability reports will be shared with application teams." These commitments, consistently honored, change the trust dynamic more than any amount of improved documentation or additional features.

The psychological shift from "the platform team will get to it when they can" to "the platform team has committed to respond within two hours" is significant for application engineers deciding whether to depend on the platform for critical workflows. The platform with a service commitment feels safer to depend on than the platform without one.

Starting narrow and proving value first — the sequencing discipline most platforms skip

The organizations that built the most successful internal platforms did not start with the most comprehensive scope. They started with the most important problem: the developer workflow that was causing the most friction for the most teams. They solved that problem well before expanding to the second most important problem.

This sequencing discipline is harder than it sounds because platform engineers have technical ambitions. They can see the full architecture of what a mature internal developer platform would look like, and it is tempting to build toward that full architecture. The difficulty is that building toward a full architecture before delivering on the most important first problem means that application teams are waiting for value while the platform team is building infrastructure.

The platform that started narrow, solved the most important problem first, and expanded from there has a fundamentally different relationship with its users than the platform that started broad and is still trying to gain adoption for its comprehensive feature set. The first platform has proven its value in a specific context that developers care about. The second platform is still trying to establish that it belongs in the workflow.

Scope discipline is the most underappreciated characteristic of successful platform engineering programs. The platform team that can say no to interesting-but-not-urgent problems, in favor of doing the most important problem exceptionally well, tends to build the platform that developers actually recommend to colleagues. That recommendation is the metric that matters most in year one.

Platform reliability as a contract with application teams — reliability first, features second

The platform team's reliability commitments function as a contract with the development teams that depend on the platform. When the platform is unreliable, every team that depends on it bears the reliability cost. A deployment pipeline that fails intermittently creates uncertainty for every team deploying through it. An environment provisioning service that takes 30 minutes instead of the expected 5 creates planning uncertainty for every team that needs new environments.

The platform team that treats its own reliability with the same rigor it expects from application teams builds trust through demonstrated consistency. This means the platform has defined service level objectives. It has an on-call rotation that responds to platform issues with urgency. It has a public incident history that shows how platform reliability has improved over time. These practices communicate to application teams that the platform team takes its reliability responsibility seriously.

The absence of these practices communicates the opposite: that the platform is a best-effort service that application teams should plan around rather than depend on. Application teams that receive this message respond rationally: they add workarounds and fallback paths that duplicate the platform's functionality. The duplicated functionality is what the platform was supposed to eliminate. The platform has failed at its purpose because its reliability did not merit the trust required for genuine adoption.

Platform reliability investment is not optional infrastructure work. It is the foundational commitment that makes everything else the platform provides valuable. The most feature-rich internal developer portal with poor reliability is less valuable than a simple, reliable set of well-supported golden paths. Reliability first, features second. The order is not arbitrary.

Frequently asked questions

Why do technically good internal platforms fail to get adopted?

Because trust in infrastructure tools is fragile, and adoption does not follow technical quality — it follows the quality of the first experience. An engineer who hits an unexplained failure during their first deployment with a new tool will wait a long time before trying again. If they had to escalate to a platform engineer to resolve it, the credibility cost is higher still. The platform team sees a resolved incident. The application engineer sees a tool that broke on its first use. The design of the first experience matters more than the quality of the documentation for users who never get past the first experience.

What is the cognitive load trap in platform adoption?

Platform teams who understand their system deeply consistently underestimate how much a new user needs to learn before the platform becomes useful. They treat their own mental model of the system as the baseline. Application engineers need to build that mental model from zero. The trap is that when adoption is low, the instinct is to improve documentation. Documentation helps engineers who are already trying to learn the platform. It does not reach engineers who stopped trying because the initial experience was too difficult. The fix is design, not documentation: workflows that abstract the underlying complexity rather than explain it.

What is a golden path with a support commitment, and why does it change adoption?

A golden path is an opinionated, well-supported, well-documented way to do the things every team needs to do most often — not the only way to use the platform, but the way that comes with a support contract. The specific commitment is what makes the difference: a platform engineer available within the same business day if something breaks on the golden path. Engineers adopt the path developers trust. Trust comes from reliable support when things go wrong, not from the absence of problems. Once enough teams have positive golden path experiences, social proof does the rest — engineers recommend it to each other instead of warning each other away.

What leading indicators tell you whether platform adoption is healthy or fragile?

Four metrics provide an honest picture: developer satisfaction with platform documentation (quarterly survey), time to first successful deployment for new teams (sensitive to first-experience quality), the ratio of support requests to active users (measures platform burden on application engineers), and net promoter score among application teams (captures whether users are recommending or warning). NPS in particular is the early warning for fragile adoption: engineers using the platform because they have to rather than because they want to show up as low NPS despite high adoption numbers, which signals what happens when adoption becomes optional.

Golden paths developers actually choose — the design discipline behind the golden path with a support commitment: what makes a path something engineers reach for under deadline pressure.
You don't need an internal developer platform. Yet. — the sequencing argument: this failure happens when you build the portal before the foundation is trustworthy.
Cognitive Absorption in platform engineering — the definitive reference — the design accountability that reframes the cognitive load trap: the platform should absorb complexity, not document it.
5 signs your platform team is stuck in ad-hoc mode — Signs 1 and 3 (deploy variance, slow onboarding) are the same friction patterns this post's $2M platform failed to address before launch.

If your platform team is building something that application teams are quietly avoiding, a Platform Engineering Assessment can help you understand what is driving the gap and what to do about it.

The Platform Engineering Mistake That Cost $2M and Two Years

The Platform Engineering Mistake That Cost $2M and Two Years

Why technically superior internal platforms fail to get adopted

The gap between an infrastructure team mindset and a product team mindset

Why documentation does not fix cognitive load — design does

What the recovery from 40% abandonment actually looked like step by step

What to measure once adoption is climbing — the metrics that actually tell you why

Measuring trust, not features: what success looks like in internal platform work

What a platform team SLA changes — and why most teams do not have one

Starting narrow and proving value first — the sequencing discipline most platforms skip

Platform reliability as a contract with application teams — reliability first, features second

Frequently asked questions

Related Articles

The Cost of Not Investing in Platform Engineering

Platform Engineering Consulting vs. Hiring: When Each Makes Sense

IDP Build vs Buy: A Decision Framework for Engineering Leaders

Stay updated with Clouditive

See where your delivery stands.