At 2:47 AM, the payments team gets paged. Their pods are OOMKilled. The cluster has 96 cores and 384GiB of memory across 12 nodes. The payments namespace is using 8 cores and 32GiB — well within its allocation. The problem is the data-pipeline namespace, which has no resource quota and whose nightly ETL job just consumed 14GiB of memory it was never supposed to touch.
This is the failure mode that shared Kubernetes clusters produce when you skip the governance layer. Namespaces without quotas are an open bar. One team’s memory leak, runaway batch job, or forgotten load test becomes every other team’s production incident.
The fix is two Kubernetes primitives — LimitRange and ResourceQuota — applied consistently before any team gets cluster access. Neither is complex. Not applying them before something breaks is significantly more expensive than applying them before something breaks.
The Shared Cluster Cost Problem
A shared cluster operates on a fundamentally different economic model than dedicated per-team clusters. Node capacity is pooled. When one namespace consumes beyond its fair share, it does not pay more — it simply evicts other teams’ pods.
The economics of shared clusters are favorable when they work correctly. A shared 3-node cluster running 8 teams at 40-60% average utilization costs substantially less than 8 separate clusters with their own control plane overhead, node minimums, and per-cluster tooling. The efficiency gains are real only when resource boundaries prevent one tenant from starving another.
Without quotas, three failure modes compound:
A memory leak in a development namespace consumes available node memory gradually over 6 hours. The scheduler cannot place new production pods because no node has sufficient free memory. The cluster appears healthy — CPU is at 30% — but every new deployment fails with Insufficient memory until the offending deployment is found and scaled down.
A batch job with no CPU limit runs at full node capacity during business hours. Other pods on the same node get CPU-throttled. Response times increase but the pods do not crash, making the cause harder to trace. The symptom appears as an application slowdown, not a resource problem.
A developer runs a load test against staging without coordination. The HPA scales the staging deployment to 40 replicas. Production deployments fail because the cluster has no remaining schedulable capacity.

LimitRange vs ResourceQuota: What Each Does and When You Need Both
These two objects operate at different scopes and serve different purposes. Using one without the other leaves gaps.
| Dimension | LimitRange | ResourceQuota |
|---|---|---|
| Scope | Per container or pod | Per namespace |
| What it controls | Default requests/limits, min/max per container | Total CPU, memory, object count across namespace |
| Enforcement | Admission controller at pod creation | Admission controller at any resource creation |
| Behavior when missing | Pods created without requests/limits (invisible to scheduler) | Namespace can consume unlimited cluster resources |
| Failure mode when absent | Node overcommit, OOM evictions | Noisy neighbor starves other namespaces |
LimitRange solves the problem of pods created without resource declarations. If a developer writes a Deployment with no resources block, Kubernetes schedules it on a node with no knowledge of what it will actually consume. The node overcommits, and the first memory pressure event triggers evictions. A LimitRange with default values automatically injects requests and limits into any container that omits them.
A sensible LimitRange default for a dev namespace: defaultRequest of 100m CPU and 128Mi memory, default limit of 500m CPU and 512Mi memory, max of 2 CPU and 4Gi memory. This prevents any single container from running unbounded while giving workloads enough headroom for typical development tasks.
ResourceQuota solves the namespace-level consumption problem. It caps the total CPU, memory, and object count that a namespace can consume. When the quota is reached, new pods in that namespace fail admission rather than evicting pods from other namespaces.
A production namespace serving web traffic might get: 20 CPU requests, 40Gi memory requests, 40 CPU limits, 80Gi memory limits. A dev namespace might get 4 CPU requests and 8Gi memory. The gap between dev and production quotas forces teams to be explicit when they need production-equivalent resources for load testing.

Both objects should be applied at cluster bootstrap before any application namespaces are created. Applying them retroactively to a cluster with running workloads requires a migration window — existing pods are not evicted, but new deployments fail if they exceed the quota.
Namespace Design: Patterns That Enable Cost Attribution
The namespace structure determines whether you can answer the question “how much does team X cost per month?” The answer is only possible if the namespace boundary maps cleanly to a cost owner.
Three patterns cover most organizations:
Team-per-namespace: Each engineering team owns one namespace per environment. payments-prod, payments-staging, payments-dev. Cost attribution is exact — every resource in payments-prod belongs to the payments team. Quota management is per-team per-environment.
Product-per-namespace: Each product line owns a namespace. Multiple teams may deploy into the same product namespace. Attribution is to the product, not the team. This works for organizations that charge back at the product or business unit level rather than the team level.
Shared service namespaces: Infrastructure components (monitoring, ingress, cert-manager) live in dedicated namespaces with their own quotas. Platform costs are separated from application costs. This makes the platform team’s resource consumption visible and prevents it from being silently absorbed into per-team cost attribution.
The label schema is what actually enables cost attribution tooling to produce accurate reports:
| Label | Values | Purpose |
|---|---|---|
team | payments, data, platform | Maps namespace to owning team |
environment | prod, staging, dev | Separates cost by environment tier |
cost-center | cc-1042, cc-2031 | Maps to finance GL code for chargeback |
product | checkout, analytics | Groups across team boundaries for product-level reporting |
Apply these labels at namespace creation and enforce them via a Gatekeeper constraint that rejects namespace creation without all four labels. Without enforcement, labels drift — teams skip them, use inconsistent values, or forget to update when ownership changes.
Cost Attribution: Namespace-Level Chargeback With Real Numbers
Kubecost and OpenCost both provide namespace-level cost breakdown with accuracy rates around 97% for on-demand node costs. The remaining 3% comes from shared cluster overhead — control plane, DaemonSets, cluster-wide add-ons — which must be allocated proportionally.
A 3-node cluster running 8 namespaces with the following allocation:
| Namespace | CPU Requested | Memory Requested | Monthly Node Cost Attribution |
|---|---|---|---|
| payments-prod | 8 cores | 32Gi | 486 |
| data-pipeline | 6 cores | 48Gi | 540 |
| auth-prod | 4 cores | 16Gi | 270 |
| frontend-prod | 3 cores | 12Gi | 189 |
| shared-infra | 2 cores | 8Gi | 135 |
| dev (all teams) | 4 cores | 16Gi | 270 |
| staging | 3 cores | 12Gi | 189 |
| monitoring | 2 cores | 8Gi | 121 |
Node cost at $0.192/hr (m5.2xlarge) × 3 nodes × 730 hours = $420.48/month. The allocation above distributes that cost to namespace owners based on requested resources, not actual usage. Using actual usage instead of requests is more accurate but creates incentives for teams to under-request resources to minimize their attribution.

The practical recommendation: attribute based on requests for the first 6 months to give teams stable, predictable bills. Switch to actual usage attribution once teams have had time to right-size their resource requests and understand what drives their costs.
Failure Modes and the Defaults to Set on Day One
Five failure modes appear repeatedly in clusters that were bootstrapped without multi-tenancy governance:
| Failure Mode | Symptom | Fix |
|---|---|---|
| No LimitRange on namespace | Pods created without requests/limits; scheduler overcommits node | Apply LimitRange with sensible defaults before first deployment |
| ResourceQuota set too tight | Deployments fail with QuotaExceeded during rollouts; engineers manually delete old pods | Set quota headroom at 2x typical peak, review quarterly |
| HPA ignores quota | HPA scales beyond namespace quota; new replicas fail admission; traffic drops | Set HPA maxReplicas ≤ (quota CPU limit / pod CPU limit) |
| Missing cost-center label | 30% of namespace cost unattributable; finance rejects chargeback report | Enforce label schema via Gatekeeper at namespace creation |
| Dev namespace shares cluster with prod | Load test in dev causes node pressure affecting prod pods | Apply taint/toleration separation or use dedicated node pools for prod |
The defaults to set on day one, before any application teams get access:
| Environment | CPU Request | CPU Limit | Memory Request | Memory Limit | Max Pods |
|---|---|---|---|---|---|
| prod | 20 cores | 40 cores | 40Gi | 80Gi | 100 |
| staging | 8 cores | 16 cores | 16Gi | 32Gi | 50 |
| dev | 4 cores | 8 cores | 8Gi | 16Gi | 30 |
These are starting points, not permanent values. Run Kubecost or OpenCost for 30 days and adjust quotas to match actual peak consumption plus 40% headroom. A quota that is never hit provides no protection. A quota that is constantly hit creates operational friction. The target is a quota hit rate below 5% in steady state.
Shared clusters are worth running. The utilization efficiency, the reduced control plane overhead, the simplified platform tooling — the economics are clear. But shared clusters without quotas are not shared clusters. They are a single team’s cluster that other teams happen to deploy into until the wrong job runs at the wrong time. LimitRanges and ResourceQuotas are the primitives that make multi-tenancy real. Apply them at bootstrap. Review them quarterly. The 2 AM page is optional.