CI/CD Infrastructure Cost: The Budget Leak Hidden in Your Pipelines

Most engineering teams audit their production infrastructure. Fewer audit the infrastructure that builds and deploys it. GitHub Actions runner minutes, ECR image layers, artifact archives, and ephemeral test clusters sit in a different budget category from application workloads. They rarely get the same scrutiny. That is where the leak starts.

This is not about what your pipelines deploy. It is about what the pipelines themselves cost.

The Pipeline Tax You Never Budgeted For

CI/CD infrastructure cost has four components that most teams treat as fixed overhead: runner compute, image storage, artifact storage, and ephemeral test environments. Each one looks small per job. At scale, they combine into thousands per month.

The mechanism is velocity multiplication. A team pushing 50 commits per day, each triggering a 10-step pipeline, generates 500 job executions daily. Each job touches runner compute, writes to artifact storage, and often pulls a fresh image. The cost accumulates before anyone notices because it tracks engineering activity, not a resource you consciously provisioned.

Pipeline cost components feeding the monthly bill

The four components interact. A cache miss on an image layer triggers a full rebuild, adding 5 minutes of runner compute and a new ECR write. A bloated artifact archive grows until someone sets a retention policy. An integration test environment that was “temporary” runs for months because no one owns the shutdown.

Runner Costs Compound Faster Than You Think

GitHub Actions hosted runner pricing is straightforward on its surface. Linux runners cost $0.008 per minute. Windows runners cost $0.016 per minute. macOS runners cost $0.08 per minute. The multiplier between Linux and macOS is 10x, which matters when iOS builds or Xcode toolchains require macOS.

A team running 500 CI jobs per day at an average of 8 minutes each on Linux spends $960 per month in runner minutes. The same workload on macOS costs $9,600. Most teams do not frame this as a decision with a $8,640 monthly delta. They pick the runner type once and move on.

Runner Type	Rate per Minute	500 jobs/day x 8 min x 30 days
Linux (hosted)	$0.008	$960/month
Windows (hosted)	$0.016	$1,920/month
macOS (hosted)	$0.080	$9,600/month
m5.xlarge EC2 (self-hosted, 15% utilization)	~$0.054 effective	$6,480/month equivalent
m5.xlarge EC2 (self-hosted, 70% utilization)	~$0.0046 effective	$552/month equivalent

Self-hosted runners look attractive until you factor in idle time. An m5.xlarge instance costs $0.192 per hour on-demand, $132 per month running continuously. At 15% utilization, which is common for teams without autoscaling, the effective per-minute cost is 6.7x the Linux hosted runner rate. The machine runs while engineers sleep. The hosted runner does not.

Self-hosted runners become cost-effective at roughly 3,000 build minutes per month, but only when paired with autoscaling that terminates idle nodes. Below that threshold, or without autoscaling, hosted runners are cheaper and operationally simpler.

Image and Artifact Storage: The Silent Accumulator

Amazon ECR charges $0.10 per GB per month. That rate sounds trivial until you calculate what accumulates without a retention policy.

A team pushing one image per build, at 1.2 GB per image, across 200 builds per day, writes 240 GB of new image data per day. With no lifecycle rule, ECR retains every tag indefinitely. Within a month, that account holds 7.2 TB of image data at $720 per month in storage alone. Most images beyond the last 5-10 tags will never be pulled again.

ECR storage accumulation without lifecycle policy

Docker layer caching reduces per-build image sizes when base layers do not change. A cache miss on a base layer forces a full rebuild. That adds 4-6 minutes of runner compute and a new full-image push to ECR. If a dependency update invalidates the base layer cache across 50 builds per day, that one day of cache misses costs $1.92 in extra runner minutes and $7.20 in new ECR storage.

GitHub Actions artifact storage compounds the same way. The free tier covers 500 MB per repository. Beyond that, storage costs $0.25 per GB per month. Test reports, coverage files, build outputs, and compiled binaries pile up with 90-day default retention. An active monorepo generating 50 MB of artifacts per build, 100 builds per day, accumulates 150 GB of artifact data per month. After the first 500 MB, that is $37.50 per month from a default no one changed.

Ephemeral Environments: When Convenience Costs Dollars Per Build

Integration tests that require a full application stack need somewhere to run. Ephemeral environments that spin up a minimal EKS cluster, a database, and a load balancer for a 45-minute test run cost $2-4 per build. That estimate covers an m5.large node at $0.096/hr for one hour, a NAT gateway at $0.045/hr, and ALB at $0.008/hr. At 100 builds per day, that is $200-400 per day, or $6,000-12,000 per month.

Approach	Cost per Build	100 builds/day	Monthly Cost
Ephemeral EKS cluster (per build)	~$3	~$300/day	~$9,000
Always-on shared cluster (m5.xlarge x3)	—	shared	~$1,200 fixed
Always-on cluster (utilization 12%)	effective $25/build	—	$1,200 wasted on idle
Shared cluster with namespace isolation	~$0.30	~$30/day	~$900

Always-on shared clusters have a different problem. Teams that run integration tests on dedicated clusters often leave those clusters running 24/7. The cluster is used for 20-30 build hours per week, but it runs for 168 hours. At 12% utilization, the team is paying for 88 hours of idle compute for every 12 hours of actual test execution.

The answer for most teams is neither full ephemeral per-build nor always-on shared. Namespace-isolated environments on a right-sized shared cluster, with a scheduler that scales the cluster down during low-build hours, bring costs to roughly $0.30 per build while avoiding the cold start time of full cluster provisioning.

Four Changes That Cut Pipeline Spend Without Slowing Delivery

Set ECR lifecycle policies before storage compounds. A lifecycle policy that retains the last 10 tagged images per repository and removes untagged images after 24 hours stops the accumulation. This works immediately on new pushes. It does not retroactively reduce existing storage until the policy runs its first evaluation.

This fails when teams use image tags as immutable audit references for compliance. In that case, move old images to a separate registry with cheaper storage rather than deleting them.

Match artifact retention to actual usage patterns. GitHub Actions default retention is 90 days. Most test reports are useful for 7 days, at most. Setting per-workflow retention to 7-14 days reduces artifact storage by 80-90% without affecting incident response. This breaks when audit requirements mandate build artifact retention. Map artifact type to retention requirement before applying a blanket TTL.

Right-size runners to the job. Not all CI jobs need 8 vCPUs and 32 GB of RAM. Linting, static analysis, and unit tests run cleanly on 2 vCPU, 4 GB machines. Build and integration jobs need more. The real cost of developer platform tooling follows the same principle: default configurations are sized for the worst case, and teams pay for that headroom on every run.

Optimized pipeline with right-sized jobs and TTL policies

Schedule shared CI infrastructure like non-production environments. If your integration test cluster runs 24/7 but builds only occur between 7am and 11pm on weekdays, an automated schedule that scales the cluster to zero nodes overnight and on weekends cuts compute cost by 35-40%. The cluster does not delete, it just has no running nodes. New builds trigger node provisioning, which takes 2-3 minutes on EKS with managed node groups.

This breaks for teams with globally distributed engineers who push commits at all hours. In that case, reduce cluster size during low-traffic windows instead of scaling to zero. Two nodes instead of eight overnight still cuts cost by 62%.

Cloud cost anomaly detection catches when these controls fail. An ECR lifecycle policy that was accidentally overridden, an artifact retention reset by a workflow template update, or a CI cluster that never scaled down after a config change all produce cost spikes that look like application spend but are pipeline spend. Without detection, those anomalies run until the monthly bill review.

The pipeline tax is real. It scales with engineering velocity, which makes it politically difficult to address: the teams generating the most pipeline spend are also the most productive. But the cost is not intrinsic to the work. It is a product of defaults that were never revisited. Changing four settings — lifecycle policies, artifact TTLs, runner sizing, and cluster scheduling — recovers most of the spend without touching the velocity that generated it.