Skip to main content
Platform Engineering8 min read·

Kubernetes for Platform Teams: Six Decisions Before Handing It to Developers

Running Kubernetes is the easy part. The hard part is deciding how it's used — namespace strategy, resource quotas, secrets, ingress, observability — before developers make those decisions themselves.

Kubernetes for Platform Teams: Six Decisions Before Handing It to Developers

You have Kubernetes. Your platform team runs it. But developers are writing their own Deployment manifests, their own HPA configs, their own ingress rules — and no two look the same. One team sets CPU limits to zero and relies on requests. Another sets limits aggressively and causes OOMKills under load. Three teams have different ingress controller annotations because they each found a different Stack Overflow answer.

Nobody did anything wrong. The cluster was handed to developers before anyone decided how it should be used. Now the platform team spends its time debugging inconsistent configurations rather than building platform capabilities.

The platform team's job isn't to run Kubernetes. It's to decide how Kubernetes is used, then make the right way the easy way. That work happens before self-service deployment opens up, not after.

Here are six decisions that need answers before developers start writing their own manifests.

1. Namespace strategy

Namespace design determines isolation, RBAC, resource quotas, and operational complexity. There is no single right answer, but there must be one answer per organization.

The common options: per-team namespaces (each product team gets one or more namespaces), per-service namespaces (each deployed service gets its own), or per-environment namespaces (dev/staging/prod within a shared cluster). Each has different trade-offs on blast radius, access control granularity, and operator overhead.

What matters is not which model you choose. What matters is that you choose one model, document it, and enforce it. When developers create namespaces ad hoc, you end up with a cluster that has forty namespaces named after the engineer who created them, no consistent RBAC model, and no clear ownership when something breaks.

The platform team defines the namespace strategy. Cluster provisioning automation creates namespaces according to that strategy. Developers do not create namespaces directly.

2. Resource quotas and requests

Every workload in a production cluster must declare resource requests and limits. This is a platform policy, not a developer preference, because without it a single poorly-configured deployment can consume cluster resources and degrade unrelated services.

The platform team's job here is two things. First, enforce the policy: an admission webhook or CI gate that rejects Deployment manifests missing resource requests and limits. Second, provide sensible defaults: golden path templates that declare reasonable starting values for common workload types. Developers shouldn't be guessing at CPU and memory values. They should be starting from a default that is appropriate for a typical web service and adjusting from there with profiling data to justify the change.

Without enforcement, the policy doesn't exist in practice. Developers under deadline pressure will skip resource declarations. The enforcement is the policy.

3. Image standards

Three questions belong to the platform team, not to individual service teams.

Which base images are permitted? An approved base image list prevents developers from pulling images directly from Docker Hub without security review, and ensures the organization's vulnerability scanning cadence covers the base images in use. The platform team maintains the approved list and reviews new additions.

Which registry is used? One registry, one policy. Teams should not be pushing images to different registries. Centralizing image storage simplifies access control, audit logging, and scanning coverage.

Who is responsible for patching base images? When a CVE is disclosed against a base image, someone needs to update it and trigger rebuilds across every service that uses it. That is a platform responsibility if it belongs to the platform team's image policy, not a responsibility that falls randomly to whichever team happens to notice the vulnerability report first.

4. Ingress and networking

When developers choose their own ingress controller annotations, the cluster ends up with services configured against different ingress implementations, different timeout behaviors, and different TLS termination models. Support becomes expensive when every service behaves slightly differently.

The platform team picks one ingress controller (or one service mesh) and one set of conventions. Developers configure their services against that interface. They declare which host and path their service owns; they do not decide how TLS is handled, how timeouts are set, or which ingress controller implementation is used.

If the organization has network policy requirements — restricting which services can communicate with each other — those policies are also platform decisions enforced through standard templates, not configurations developers set up themselves.

5. Secrets management

Secrets do not belong in ConfigMaps. They do not belong in environment variables baked into container images. They do not belong in files committed to the repository and referenced during CI.

The platform team provides a secrets management integration: an External Secrets Operator configuration, a Vault agent sidecar pattern, or a cloud-provider secrets manager integration. Whatever the mechanism, the developer experience is: declare which secret your service needs, and it is mounted at the path the documentation specifies. Developers do not configure the secrets injection mechanism itself.

This matters because secrets injection is a security surface. The pattern for injecting a secret — where it comes from, how it is scoped, when it rotates — is a policy decision. Letting developers invent their own patterns produces a cluster where some services are reading secrets from Kubernetes Secrets directly, some are using Vault, some have secrets in environment variables set during CI, and the security team has no complete picture of what is accessing what.

6. Observability baseline

Every pod deployed through the platform should emit structured logs, expose health endpoints, and produce metrics in a format the platform's monitoring stack can scrape. This is not optional configuration that developers add when they have time. It is a precondition for running production workloads.

The golden path templates handle this. A deployment created from the standard template gets Prometheus scrape annotations, liveness and readiness probes that conform to the standard, and a structured logging configuration that routes to the platform's log aggregation. Developers do not configure these from scratch; they configure their application's endpoints against the standard interface.

When observability is optional and developer-configured, the on-call rotation pays for it. At 2am, when a service is degraded and the on-call engineer opens Grafana, the difference between a service with a standard observability baseline and one without is the difference between a fifteen-minute investigation and a two-hour one.

What happens when the platform team skips these decisions

Each of these decisions has a default outcome when the platform team doesn't make it: developers make it themselves, independently, and inconsistently.

The cluster becomes a collection of individual teams' implementation decisions rather than an organization's infrastructure. The platform team's time shifts from building capabilities to debugging whatever the most recent inconsistency produced. Onboarding a new team becomes a process of learning from existing teams' examples rather than following a documented path.

The cost compounds over time. An organization with fifty services, each with independently configured namespaces, resource policies, ingress rules, and secrets management, has fifty independent configurations to audit when a compliance requirement changes. An organization where the platform team made these six decisions and enforced them through golden path templates has one configuration to update.

Making these decisions before handing Kubernetes to developers is not about restricting what developers can do. It is about ensuring that the default path — the one developers take when they're under deadline pressure and not thinking carefully about infrastructure — is also the secure, observable, consistent path.

For more on how golden paths work as a product, see Golden Paths: What They Are and Why Developers Choose Them. For the infrastructure foundation these decisions sit on, see the Platform Foundation service and the Foundations framework overview.

kubernetesplatform engineeringdeveloper experience

Found this useful? Share it with your network.

Matías Caniglia

Mat Caniglia

LinkedIn

Founder of Clouditive. 18+ years transforming engineering organizations across LATAM and globally through Developer Experience consulting.

79 articles published

Related Articles

Platform Engineering

The Cost of Not Investing in Platform Engineering

Every hour engineers spend fighting deploy friction, waiting on platform tickets, or repeating slow onboarding is a real cost. A framework for making the number concrete.

Read More →
Platform Engineering

Platform Engineering Consulting vs. Hiring: When Each Makes Sense

An honest analysis for a VP Eng facing the build-the-team-or-bring-in-a-consultancy decision. Cover the 3-6 month critical window, failure modes of each approach, and what a good engagement exit looks like.

Read More →
Platform Engineering

IDP Build vs Buy: A Decision Framework for Engineering Leaders

A structured decision framework covering total cost of ownership, team capacity requirements, vendor lock-in spectrum, what changes at 10 vs 50 vs 200 engineers, and the hybrid path.

Read More →

Stay updated with Clouditive

Long-form analysis on platform engineering, DORA, and AI readiness from Mat Caniglia. Sent when there is something worth reading.

Start here

See where your delivery stands.

A fifteen minute self-diagnostic that scores your platform across DORA metrics, deployment frequency, change failure rate, and cognitive load. No sales call required.

Want to read first? See the Foundations Framework