Skip to main content
The Egress Bill: Why Your Multi-Region Architecture Is Bleeding $40k/Month

The Egress Bill: Why Your Multi-Region Architecture Is Bleeding $40k/Month

Muskan Sharma By Muskan Sharma
Published: May 4, 2026 10 min read

Multi-region is the architecture default for serious SaaS in 2026. The justification is real: disaster recovery, latency, sovereignty. The cost is hidden in the bill under “data transfer” and “regional traffic” and gets zero attention because no engineer “owns” data transfer the way they own compute or storage.

A multi-region SaaS spending $200,000 per month on AWS pays $30,000-$50,000 of that bill on data transfer alone. Most of it is AZ-to-AZ replication for stateful systems, plus cross-region database read replicas, plus chatty service-to-service calls that should have stayed regional. The compute team optimizes compute. The storage team optimizes storage. The egress bill keeps growing because it sits in the gap between teams.

FinOps is the engineering practice of bringing financial accountability to variable cloud spend by aligning engineering, finance, and product on continuous cost decisions, per the FinOps Foundation. Applied to egress, the practice has four levers: replication topology, service placement, observability routing, and endpoint coverage. This piece covers each in order of typical impact.

Why the Egress Bill Stays Hidden

The egress bill is invisible because no one team owns data transfer. Compute is owned by the team running the service. Storage is owned by the team that picked the bucket. Data transfer is the result of a thousand independent decisions about where services live, how they communicate, and what their replication policies are. Without a single owner, optimization stalls.

The bill grows organically. Adding a region for compliance adds 10-20% to data transfer. Adding a third microservice that calls the existing two crosses an AZ boundary. Adding observability for the new feature ships another 200GB/day to the central region. Each decision is locally rational. Aggregated, they are the line item nobody can explain.

This pattern works when the team has a designated “platform” or “infrastructure” owner for egress. It breaks when egress is treated as an emergent property of architecture decisions, because by the time the bill grows large enough to investigate, the architectural choices that drove it are already deeply embedded.

The Pricing Gradient: Free to Brutal

AWS, GCP, and Azure all use a four-tier model. Movement within the smallest scope (a single AZ) is free or near-free. Each step out — across AZ, across region, across the public internet — adds an order of magnitude to the cost.

PathAWSGCPAzure
Intra-AZ / intra-zoneFreeFreeFree
Cross-AZ / cross-zone (same region)$0.01/GB each direction$0.01/GB$0.01/GB
Cross-region (same continent)$0.02/GB$0.02/GB$0.02/GB
Public internet egress (first 10TB/month)$0.09/GB$0.12/GB (premium tier)$0.087/GB
Public internet egress (>500TB/month)$0.05/GB$0.08/GB$0.05/GB

Architecture diagram

The same 10TB of monthly traffic costs $0 intra-AZ, $200 cross-AZ round-trip, $200 cross-region one-way, or $900 to the public internet. The architectural decision of where to place a service and which path its traffic takes drives a 100x cost gap on identical bytes.

Replication Topology: Where the Big Money Hides

Cross-AZ and cross-region replication for stateful systems is the largest waste category. Multi-AZ Postgres or MySQL with synchronous replication doubles every write’s egress cost. Cross-region read replicas for global services multiply it again.

A 1TB write workload on synchronous multi-AZ Postgres pays $20/month in cross-AZ replication for the second AZ alone. Add a third AZ for the recommended HA pattern, $40/month. Add a cross-region read replica in another continent, $200/month for the cross-continent egress. The same write workload on a single-AZ primary with async cross-AZ replicas pays near zero, at the cost of recovery point objective (RPO) increasing from 0 seconds to roughly 30 seconds during a failover.

Architecture diagram

Cross-region replicas for analytics workloads are the easiest fix. Real-time replicas only justify themselves for transactional read paths where latency-to-source matters. Analytics workloads accept a one-day staleness for a 70-90% lower egress cost, by replacing real-time replication with S3 Cross-Region Replication snapshots that refresh nightly.

Replication topology1TB write workload, 2 replica regions, monthly egress
Synchronous multi-region active-active$400 (two cross-region copies of every write)
Single primary + sync multi-AZ + async cross-region$40 (cross-AZ only, async cross-region)
Single primary + async multi-AZ + nightly snapshot to other region$5 (snapshot deltas only)
Single-AZ primary, snapshots only$0

The right topology depends on RPO, recovery time objective (RTO), and read locality requirements. Most teams default to the most expensive option (active-active sync) because it sounds safest. The right answer for most workloads is the second or third row.

This pattern works when the team can classify workloads by RPO/RTO. It breaks when every workload is treated as critical because nobody wants to be the one who downgrades replication. The fix is an explicit per-workload classification with sign-off from the workload owner.

Service Placement: Topology-Aware Routing

The microservices version of the same problem. A 30-service architecture deployed naively across AZs (Kubernetes scheduling pods wherever capacity exists) generates 5-10TB/month of cross-AZ service-to-service traffic. At $0.02/GB round-trip, that is $100-$200/month, scaling linearly with service count and traffic.

Topology-aware routing keeps 70-80% of traffic intra-AZ at zero cost. The pattern: services prefer in-AZ peers when available, fall through to cross-AZ only when capacity demands. Kubernetes supports this via topologyKeys on services and pod-anti-affinity for replicas.

Architecture diagram

The trade-off is uneven load distribution across AZs. Topology-aware routing assumes services have enough capacity in each AZ to handle local demand. For services with low replica counts (1-2 replicas total), forcing in-AZ traffic produces hot AZs and idle AZs. The fix is a minimum replica count of 3 (one per AZ) for any service that participates in topology-aware routing.

ArchitectureCross-AZ trafficMonthly cost (10TB total)
30 services, naive scheduling50% cross-AZ$100 cross-AZ
30 services, topology-aware25% cross-AZ$50 cross-AZ
30 services, single-AZ deployment0% cross-AZ$0 (but no AZ HA)

This pattern works for stateless services where any replica can serve any request. It breaks for sticky-session workloads where requests must land on a specific replica, because topology-aware routing then competes with session affinity.

Observability Routing: The Quiet Egress Tax

Observability data shipping is 30% of egress in many setups, per the FinOps Foundation 2026 observability report. Every region’s logs, metrics, and traces ship to a central observability vendor, and most teams never measure how much that costs in egress alone.

The default deployment of any observability agent (Datadog, New Relic, Splunk, Honeycomb) is “ship every event to the vendor’s endpoint.” If the vendor’s region differs from where the workload runs (which it almost always does in multi-region setups), every byte crosses regions. A 5-region deployment shipping 1TB/day of observability data per region to one central vendor region pays $30,000+/month on observability egress alone.

Architecture diagram

Region-local aggregators (Vector, Fluent Bit, OpenTelemetry Collector) batch, compress, and sample before shipping cross-region. Compression alone saves 60-80% on log data. Sampling traces at 1-5% (instead of the default 100%) saves another order of magnitude. The trade-off is incident-time access to the dropped data, which most teams accept once they see the bill.

This pattern works when alert latency tolerates a 30-60 second batching delay. It breaks for workloads with sub-second alert SLOs that need real-time event streaming, in which case the unbatched cost is justified.

VPC Endpoints, Direct Connect, and the Endpoint Habit

VPC endpoints (AWS PrivateLink) and Cloud Interconnect (GCP) eliminate egress charges for traffic that would otherwise cross the public internet. A workload calling S3 in the same region pays $0.01/GB without an endpoint and $0.00/GB through a Gateway Endpoint. For a 50TB/month S3 access pattern, that is $500/month savings per region with one one-line Terraform configuration.

The five highest-impact endpoints to enable first:

ServiceEndpoint typeWhy it pays
S3Gateway EndpointMost workloads read 10-100TB/month from S3; endpoint eliminates per-GB charges
DynamoDBGateway EndpointSame as S3; high-volume table reads stay private
ECRInterface EndpointPulling container images from ECR otherwise crosses NAT, paying NAT processing + per-GB egress
Secrets Manager / Parameter StoreInterface EndpointEvery container start fetches secrets; volume is small but path matters for compliance
CloudWatch LogsInterface EndpointLog shipping bytes that would otherwise cross NAT

Direct Connect and Dedicated Interconnect provide flat-rate egress to on-premises networks at $0.02/GB versus $0.09/GB public internet, for workloads with sustained 1Gbps+ throughput. The breakeven is roughly 50TB/month of sustained on-prem-bound traffic, below which the port-hour fees ($0.30/hour for 1Gbps Direct Connect) exceed the per-GB savings.

This pattern works when the team has visibility into per-service egress paths via the AWS Cost and Usage Report or GCP Billing Export. It breaks when nobody has enabled CUR (which most teams haven’t, despite it being free since 2024) and the cost data is invisible at the per-service level.

A 60-Day Egress Cost Reduction Plan

Egress optimization sequences cleanly. Each phase produces measurable savings, and the data from each phase informs the next.

PhaseWeeksActionEffortExpected saving
Visibility1-2Enable AWS Cost and Usage Report (or GCP Billing Export). Build a dashboard showing top egress sources by service, source region, destination region.1 engineer-week0 (data only)
VPC endpoints3Enable Gateway Endpoints for S3 + DynamoDB. Enable Interface Endpoints for ECR, Secrets Manager, CloudWatch Logs.2 days15-25% on egress for endpoint-eligible traffic
Topology-aware service routing4-6Enable Kubernetes topology-aware routing on the top 10 services by cross-AZ traffic. Verify replica counts.2 weeks30-50% on cross-AZ service-to-service traffic
Observability aggregator rollout7-8Deploy Vector or Fluent Bit as a region-local aggregator. Configure batching, compression, sampling.1 week60-80% on observability egress
Replication topology audit9-10Classify databases by RPO/RTO. Move analytics replicas from real-time to nightly snapshots. Move non-critical workloads from sync to async multi-AZ.2 weeks40-70% on stateful replication egress
Direct Connect evaluation11-12If on-prem-bound traffic exceeds 50TB/month sustained, evaluate Direct Connect. Otherwise skip.1 week + procurementVariable, only if breakeven is clear

A team starting at $40,000/month in data transfer charges typically lands at $12,000-$18,000 after 60 days. The work is architectural, not new tooling. Each phase is testable in isolation. Most teams do the first three phases and stop, because by that point the egress bill has dropped enough that the marginal effort on the rest doesn’t pay back.

To get started, enable the AWS Cost and Usage Report and look at the Data Transfer line item broken down by source region. The largest 3-5 contributors are almost always cross-AZ replication for one major database, observability shipping for one major service, and ECR image pulls during deployments. Any one of those is a 1-week project with a measurable bill reduction. Pair the work with autonomous remediation so the gains hold once attention shifts to the next architectural problem.

Muskan Sharma

Written by

Muskan Sharma Author

Engineer at Zop.Dev

ZopDev Resources

Stay in the loop

Get the latest articles, ebooks, and guides
delivered to your inbox. No spam, unsubscribe anytime.