Kubernetes Multi-Tenancy: Resource Quotas, Namespace Isolation, and the Cost of Getting It Wrong

At 2:47 AM, the payments team gets paged. Their pods are OOMKilled. The cluster has 96 cores and 384GiB of memory across 12 nodes. The payments namespace is using 8 cores and 32GiB — well within its allocation. The problem is the data-pipeline namespace, which has no resource quota and whose nightly ETL job just consumed 14GiB of memory it was never supposed to touch.

This is the failure mode that shared Kubernetes clusters produce when you skip the governance layer. Namespaces without quotas are an open bar. One team’s memory leak, runaway batch job, or forgotten load test becomes every other team’s production incident.

The fix is two Kubernetes primitives — LimitRange and ResourceQuota — applied consistently before any team gets cluster access. Neither is complex. Not applying them before something breaks is significantly more expensive than applying them before something breaks.

The Shared Cluster Cost Problem

A shared cluster operates on a fundamentally different economic model than dedicated per-team clusters. Node capacity is pooled. When one namespace consumes beyond its fair share, it does not pay more — it simply evicts other teams’ pods.

The economics of shared clusters are favorable when they work correctly. A shared 3-node cluster running 8 teams at 40-60% average utilization costs substantially less than 8 separate clusters with their own control plane overhead, node minimums, and per-cluster tooling. The efficiency gains are real only when resource boundaries prevent one tenant from starving another.

Without quotas, three failure modes compound:

A memory leak in a development namespace consumes available node memory gradually over 6 hours. The scheduler cannot place new production pods because no node has sufficient free memory. The cluster appears healthy — CPU is at 30% — but every new deployment fails with Insufficient memory until the offending deployment is found and scaled down.

A batch job with no CPU limit runs at full node capacity during business hours. Other pods on the same node get CPU-throttled. Response times increase but the pods do not crash, making the cause harder to trace. The symptom appears as an application slowdown, not a resource problem.

A developer runs a load test against staging without coordination. The HPA scales the staging deployment to 40 replicas. Production deployments fail because the cluster has no remaining schedulable capacity.

LimitRange vs ResourceQuota: What Each Does and When You Need Both

These two objects operate at different scopes and serve different purposes. Using one without the other leaves gaps.

Dimension	LimitRange	ResourceQuota
Scope	Per container or pod	Per namespace
What it controls	Default requests/limits, min/max per container	Total CPU, memory, object count across namespace
Enforcement	Admission controller at pod creation	Admission controller at any resource creation
Behavior when missing	Pods created without requests/limits (invisible to scheduler)	Namespace can consume unlimited cluster resources
Failure mode when absent	Node overcommit, OOM evictions	Noisy neighbor starves other namespaces

LimitRange solves the problem of pods created without resource declarations. If a developer writes a Deployment with no resources block, Kubernetes schedules it on a node with no knowledge of what it will actually consume. The node overcommits, and the first memory pressure event triggers evictions. A LimitRange with default values automatically injects requests and limits into any container that omits them.

A sensible LimitRange default for a dev namespace: defaultRequest of 100m CPU and 128Mi memory, default limit of 500m CPU and 512Mi memory, max of 2 CPU and 4Gi memory. This prevents any single container from running unbounded while giving workloads enough headroom for typical development tasks.

ResourceQuota solves the namespace-level consumption problem. It caps the total CPU, memory, and object count that a namespace can consume. When the quota is reached, new pods in that namespace fail admission rather than evicting pods from other namespaces.

A production namespace serving web traffic might get: 20 CPU requests, 40Gi memory requests, 40 CPU limits, 80Gi memory limits. A dev namespace might get 4 CPU requests and 8Gi memory. The gap between dev and production quotas forces teams to be explicit when they need production-equivalent resources for load testing.

Both objects should be applied at cluster bootstrap before any application namespaces are created. Applying them retroactively to a cluster with running workloads requires a migration window — existing pods are not evicted, but new deployments fail if they exceed the quota.

Namespace Design: Patterns That Enable Cost Attribution

The namespace structure determines whether you can answer the question “how much does team X cost per month?” The answer is only possible if the namespace boundary maps cleanly to a cost owner.

Three patterns cover most organizations:

Team-per-namespace: Each engineering team owns one namespace per environment. payments-prod, payments-staging, payments-dev. Cost attribution is exact — every resource in payments-prod belongs to the payments team. Quota management is per-team per-environment.

Product-per-namespace: Each product line owns a namespace. Multiple teams may deploy into the same product namespace. Attribution is to the product, not the team. This works for organizations that charge back at the product or business unit level rather than the team level.

Shared service namespaces: Infrastructure components (monitoring, ingress, cert-manager) live in dedicated namespaces with their own quotas. Platform costs are separated from application costs. This makes the platform team’s resource consumption visible and prevents it from being silently absorbed into per-team cost attribution.

The label schema is what actually enables cost attribution tooling to produce accurate reports:

Label	Values	Purpose
`team`	payments, data, platform	Maps namespace to owning team
`environment`	prod, staging, dev	Separates cost by environment tier
`cost-center`	cc-1042, cc-2031	Maps to finance GL code for chargeback
`product`	checkout, analytics	Groups across team boundaries for product-level reporting

Apply these labels at namespace creation and enforce them via a Gatekeeper constraint that rejects namespace creation without all four labels. Without enforcement, labels drift — teams skip them, use inconsistent values, or forget to update when ownership changes.

Cost Attribution: Namespace-Level Chargeback With Real Numbers

Kubecost and OpenCost both provide namespace-level cost breakdown with accuracy rates around 97% for on-demand node costs. The remaining 3% comes from shared cluster overhead — control plane, DaemonSets, cluster-wide add-ons — which must be allocated proportionally.

A 3-node cluster running 8 namespaces with the following allocation:

Namespace	CPU Requested	Memory Requested	Monthly Node Cost Attribution
payments-prod	8 cores	32Gi	486
data-pipeline	6 cores	48Gi	540
auth-prod	4 cores	16Gi	270
frontend-prod	3 cores	12Gi	189
shared-infra	2 cores	8Gi	135
dev (all teams)	4 cores	16Gi	270
staging	3 cores	12Gi	189
monitoring	2 cores	8Gi	121

Node cost at $0.192/hr (m5.2xlarge) × 3 nodes × 730 hours = $420.48/month. The allocation above distributes that cost to namespace owners based on requested resources, not actual usage. Using actual usage instead of requests is more accurate but creates incentives for teams to under-request resources to minimize their attribution.

The practical recommendation: attribute based on requests for the first 6 months to give teams stable, predictable bills. Switch to actual usage attribution once teams have had time to right-size their resource requests and understand what drives their costs.

Failure Modes and the Defaults to Set on Day One

Five failure modes appear repeatedly in clusters that were bootstrapped without multi-tenancy governance:

Failure Mode	Symptom	Fix
No LimitRange on namespace	Pods created without requests/limits; scheduler overcommits node	Apply LimitRange with sensible defaults before first deployment
ResourceQuota set too tight	Deployments fail with QuotaExceeded during rollouts; engineers manually delete old pods	Set quota headroom at 2x typical peak, review quarterly
HPA ignores quota	HPA scales beyond namespace quota; new replicas fail admission; traffic drops	Set HPA maxReplicas ≤ (quota CPU limit / pod CPU limit)
Missing cost-center label	30% of namespace cost unattributable; finance rejects chargeback report	Enforce label schema via Gatekeeper at namespace creation
Dev namespace shares cluster with prod	Load test in dev causes node pressure affecting prod pods	Apply taint/toleration separation or use dedicated node pools for prod

The defaults to set on day one, before any application teams get access:

Environment	CPU Request	CPU Limit	Memory Request	Memory Limit	Max Pods
prod	20 cores	40 cores	40Gi	80Gi	100
staging	8 cores	16 cores	16Gi	32Gi	50
dev	4 cores	8 cores	8Gi	16Gi	30

These are starting points, not permanent values. Run Kubecost or OpenCost for 30 days and adjust quotas to match actual peak consumption plus 40% headroom. A quota that is never hit provides no protection. A quota that is constantly hit creates operational friction. The target is a quota hit rate below 5% in steady state.

Shared clusters are worth running. The utilization efficiency, the reduced control plane overhead, the simplified platform tooling — the economics are clear. But shared clusters without quotas are not shared clusters. They are a single team’s cluster that other teams happen to deploy into until the wrong job runs at the wrong time. LimitRanges and ResourceQuotas are the primitives that make multi-tenancy real. Apply them at bootstrap. Review them quarterly. The 2 AM page is optional.