Policy-as-Code with OPA Gatekeeper: Stopping Cloud Waste Before It Deploys

Your cloud account has Service Control Policies. Your Terraform pipelines have compliance checks. Your tagging strategy covers 90% of resources. And last Thursday, a developer deployed a pod requesting 64 CPU cores and 256 GB of memory into a dev namespace with no resource quotas.

SCPs operate at the cloud API layer. They can prevent someone from launching a p4d.24xlarge instance. They cannot see what happens inside a Kubernetes cluster. Between the cloud guardrails and the running workload, there is an enforcement gap at the Kubernetes admission layer. OPA Gatekeeper fills that gap.

The Gap Between Cloud Guardrails and Kubernetes Reality

Cloud governance tools work at the infrastructure boundary. AWS SCPs restrict which API calls an account can make. Azure Policy controls which resource types can be created. GCP Organization Policies set constraints on project-level operations. None of them see a Kubernetes pod spec.

Inside the cluster, 59% of containers run without CPU limits. Only 13% of requested CPU is actually used. Organizations waste 35% of their Kubernetes spending on overprovisioned resources. The waste does not come from missing cloud policies. It comes from missing admission policies.

Each layer catches what the previous one misses. SCPs block prohibited instance types. CI checks catch misconfigured Terraform. Gatekeeper blocks pods that violate cost policies at deploy time. Runtime monitoring flags workloads that drift after deployment. The admission layer is the last gate before a workload starts consuming resources and generating cost.

How Gatekeeper Works: ConstraintTemplates, Constraints, and the Audit Loop

Gatekeeper deploys as a validating admission webhook. When any resource is created, updated, or deleted, the Kubernetes API server pauses the request and sends it to Gatekeeper. Gatekeeper evaluates the request against active constraints using the OPA Rego policy engine, then returns allow or deny. This adds approximately 50ms to admission latency with proper tuning.

Admission webhook flow from kubectl to allow/deny

The system uses two custom resource types. A ConstraintTemplate defines the policy logic in Rego and declares what parameters it accepts. A Constraint instantiates a template with specific values. For example, one ConstraintTemplate defines “containers must have CPU limits below a threshold.” A Constraint instantiates it with the specific threshold: 4 CPU cores for dev namespaces, 16 for production.

Gatekeeper also runs an audit controller that periodically scans existing resources against all active constraints. This catches resources that were deployed before the constraint existed. Violations appear in the Constraint’s status field, giving operators a real-time inventory of non-compliant workloads.

OPA is a CNCF Graduated project with 4,100 GitHub stars. Production adopters include Goldman Sachs, Netflix, Pinterest, and T-Mobile. 56% of OPA users run it specifically for Kubernetes admission control.

Six Constraints That Stop Cloud Waste at the Admission Layer

The Gatekeeper Library provides ready-made ConstraintTemplates. These six target cost waste directly.

Constraint	What It Enforces	Key Parameters	What It Prevents
K8sContainerLimits	Max CPU and memory per container	cpu: 4, memory: 8Gi	A single pod consuming an entire node
K8sContainerResources	Every container must declare requests and limits	enforceRequests: true	Pods deployed with no resource boundaries
K8sReplicaLimits	Max replica count per Deployment	max_replicas: 20	Runaway HPA scaling to 200 replicas
K8sStorageClass	Restrict which StorageClasses are allowed	allowedClasses: gp3, standard	Developer claiming io2 premium SSD for a test database
K8sBlockNodePort	Deny NodePort services	none	Unnecessary load balancer costs from exposed NodePorts
Custom: deny-gpu-namespace	Deny GPU requests outside ML namespaces	allowedNamespaces: ml-training	Data engineer requesting 4 A100 GPUs for a Jupyter notebook in the dev namespace

Beyond the library, teams build custom constraints for their cost model. Common patterns include forcing dev namespace pods to tolerate spot or preemptible nodes, capping PersistentVolumeClaim size at 100 Gi in non-production environments, and denying container images above a size threshold to reduce egress and storage costs.

The key design principle: constraints should encode the cost boundaries that already exist as tribal knowledge. If the team informally agrees that dev pods should not exceed 4 CPU cores, that agreement should be a constraint, not a Slack reminder.

From Dry-Run to Enforce: The Rollout That Doesn’t Break Production

Deploying Gatekeeper constraints in enforce mode on day one is a recipe for blocked deployments and pager alerts. The three enforcement actions — dryrun, warn, and deny — exist for staged rollout.

Three-phase rollout: dryrun to warn to deny

Dry-run (weeks 1-2): Deploy all constraints with enforcementAction: dryrun. The audit controller scans existing resources and populates violation counts in each Constraint’s status. No deployments are blocked. Share the violation report with team leads. This builds awareness without friction.

Warn (weeks 3-4): Move high-confidence constraints to enforcementAction: warn. Developers see warnings on kubectl apply but their deployments proceed. This is where teams fix existing violations and establish an exception process for legitimate edge cases.

Deny (week 5+): Move validated constraints to enforcementAction: deny. Non-compliant deployments are blocked with a clear error message. New constraints always enter through dryrun first. Quarterly reviews adjust thresholds as team needs change.

The exit criterion for each phase: violation count trending downward with no legitimate workloads blocked.

Performance, Gotchas, and What Gatekeeper Cannot Do

Gatekeeper adds approximately 50ms to admission latency when properly configured. Misconfiguration — particularly setting too many webhook threads relative to available CPU — can spike latency to 3 seconds. The key tuning parameter is matching the thread count to the pod’s CPU allocation.

Gotcha	Symptom	Fix
Thread misconfiguration	Admission latency spikes to 3+ seconds	Set max-serving-threads equal to GOMAXPROCS; match to CPU allocation
Audit controller overload	High memory usage on Gatekeeper pods	Limit audit scope with namespace exclusions; increase audit interval
Constraint conflict	Two constraints give contradictory requirements	Use dryrun mode to detect conflicts before enforcing
Missing exception process	Developers bypass Gatekeeper by deploying to unmonitored namespaces	Apply constraints cluster-wide with explicit namespace exemptions rather than opt-in
Rego complexity	Policy evaluation timeout on complex constraint chains	Keep individual Rego policies under 50 lines; compose simple policies rather than building monolithic ones

The most important limitation: Gatekeeper only operates at admission time. It cannot right-size a running pod, scale down idle deployments, or optimize existing resource requests. It prevents waste from entering the cluster. It does not remediate waste that already exists. Pair it with runtime tools — VPA for right-sizing, scheduled scaling for non-production environments, and cost monitoring for drift detection.

Policy-as-code at the Kubernetes admission layer is the enforcement gap between cloud-level SCPs and runtime monitoring. Gatekeeper fills it with 50ms latency, a library of ready-made cost constraints, and a staged rollout model that builds compliance without breaking deployments. Start with the six constraints in this article on dryrun. Audit what you find. Enforce what matters. The 35% of Kubernetes spend currently wasted on overprovisioned resources is addressable — but only if the guardrails exist before the deployment reaches the scheduler.