Kubernetes for Platform Teams: Six Decisions Before Handing It to Developers
You have Kubernetes. Your platform team runs it. But developers are writing their own Deployment manifests, their own HPA configs, their own ingress rules — and no two look the same. One team sets CPU limits to zero and relies on requests. Another sets limits aggressively and causes OOMKills under load. Three teams have different ingress controller annotations because they each found a different Stack Overflow answer.
Nobody did anything wrong. The cluster was handed to developers before anyone decided how it should be used. Now the platform team spends its time debugging inconsistent configurations rather than building platform capabilities.
The platform team's job isn't to run Kubernetes. It's to decide how Kubernetes is used, then make the right way the easy way. That work happens before self-service deployment opens up, not after.
Here are six decisions that need answers before developers start writing their own manifests.
1. Namespace strategy
Namespace design determines isolation, RBAC, resource quotas, and operational complexity. There is no single right answer, but there must be one answer per organization.
The common options: per-team namespaces (each product team gets one or more namespaces), per-service namespaces (each deployed service gets its own), or per-environment namespaces (dev/staging/prod within a shared cluster). Each has different trade-offs on blast radius, access control granularity, and operator overhead.
What matters is not which model you choose. What matters is that you choose one model, document it, and enforce it. When developers create namespaces ad hoc, you end up with a cluster that has forty namespaces named after the engineer who created them, no consistent RBAC model, and no clear ownership when something breaks.
The platform team defines the namespace strategy. Cluster provisioning automation creates namespaces according to that strategy. Developers do not create namespaces directly.
2. Resource quotas and requests
Every workload in a production cluster must declare resource requests and limits. This is a platform policy, not a developer preference, because without it a single poorly-configured deployment can consume cluster resources and degrade unrelated services.
The platform team's job here is two things. First, enforce the policy: an admission webhook or CI gate that rejects Deployment manifests missing resource requests and limits. Second, provide sensible defaults: golden path templates that declare reasonable starting values for common workload types. Developers shouldn't be guessing at CPU and memory values. They should be starting from a default that is appropriate for a typical web service and adjusting from there with profiling data to justify the change.
Without enforcement, the policy doesn't exist in practice. Developers under deadline pressure will skip resource declarations. The enforcement is the policy.
3. Image standards
Three questions belong to the platform team, not to individual service teams.
Which base images are permitted? An approved base image list prevents developers from pulling images directly from Docker Hub without security review, and ensures the organization's vulnerability scanning cadence covers the base images in use. The platform team maintains the approved list and reviews new additions.
Which registry is used? One registry, one policy. Teams should not be pushing images to different registries. Centralizing image storage simplifies access control, audit logging, and scanning coverage.
Who is responsible for patching base images? When a CVE is disclosed against a base image, someone needs to update it and trigger rebuilds across every service that uses it. That is a platform responsibility if it belongs to the platform team's image policy, not a responsibility that falls randomly to whichever team happens to notice the vulnerability report first.
4. Ingress and networking
When developers choose their own ingress controller annotations, the cluster ends up with services configured against different ingress implementations, different timeout behaviors, and different TLS termination models. Support becomes expensive when every service behaves slightly differently.
The platform team picks one ingress controller (or one service mesh) and one set of conventions. Developers configure their services against that interface. They declare which host and path their service owns; they do not decide how TLS is handled, how timeouts are set, or which ingress controller implementation is used.
If the organization has network policy requirements — restricting which services can communicate with each other — those policies are also platform decisions enforced through standard templates, not configurations developers set up themselves.
5. Secrets management
Secrets do not belong in ConfigMaps. They do not belong in environment variables baked into container images. They do not belong in files committed to the repository and referenced during CI.
The platform team provides a secrets management integration: an External Secrets Operator configuration, a Vault agent sidecar pattern, or a cloud-provider secrets manager integration. Whatever the mechanism, the developer experience is: declare which secret your service needs, and it is mounted at the path the documentation specifies. Developers do not configure the secrets injection mechanism itself.
This matters because secrets injection is a security surface. The pattern for injecting a secret — where it comes from, how it is scoped, when it rotates — is a policy decision. Letting developers invent their own patterns produces a cluster where some services are reading secrets from Kubernetes Secrets directly, some are using Vault, some have secrets in environment variables set during CI, and the security team has no complete picture of what is accessing what.
6. Observability baseline
Every pod deployed through the platform should emit structured logs, expose health endpoints, and produce metrics in a format the platform's monitoring stack can scrape. This is not optional configuration that developers add when they have time. It is a precondition for running production workloads.
The golden path templates handle this. A deployment created from the standard template gets Prometheus scrape annotations, liveness and readiness probes that conform to the standard, and a structured logging configuration that routes to the platform's log aggregation. Developers do not configure these from scratch; they configure their application's endpoints against the standard interface.
When observability is optional and developer-configured, the on-call rotation pays for it. At 2am, when a service is degraded and the on-call engineer opens Grafana, the difference between a service with a standard observability baseline and one without is the difference between a fifteen-minute investigation and a two-hour one.
What happens when the platform team skips these decisions
Each of these decisions has a default outcome when the platform team doesn't make it: developers make it themselves, independently, and inconsistently.
The cluster becomes a collection of individual teams' implementation decisions rather than an organization's infrastructure. The platform team's time shifts from building capabilities to debugging whatever the most recent inconsistency produced. Onboarding a new team becomes a process of learning from existing teams' examples rather than following a documented path.
The cost compounds over time. An organization with fifty services, each with independently configured namespaces, resource policies, ingress rules, and secrets management, has fifty independent configurations to audit when a compliance requirement changes. An organization where the platform team made these six decisions and enforced them through golden path templates has one configuration to update.
Making these decisions before handing Kubernetes to developers is not about restricting what developers can do. It is about ensuring that the default path — the one developers take when they're under deadline pressure and not thinking carefully about infrastructure — is also the secure, observable, consistent path.
For more on how golden paths work as a product, see Golden Paths: What They Are and Why Developers Choose Them. For the infrastructure foundation these decisions sit on, see the Platform Foundation service and the Foundations framework overview.

Mat Caniglia
LinkedInFounder of Clouditive. 18+ years transforming engineering organizations across LATAM and globally through Developer Experience consulting.
79 articles published