Three years of cost retrospectives across mixed AWS fleets keep landing on the same finding. Teams that pick one compute commitment model and apply it across the whole fleet (all-Savings-Plan, all-On-Demand, all-Spot for the brave ones) overspend the optimum by 18 to 35 percent. The single-model decision is comfortable because it is one decision; the portfolio decision is uncomfortable because it is four decisions. Three years of data says the four-decision answer is worth the discomfort.
The portfolio that hit 95 percent of the maximum theoretical saving in the audits I ran is roughly 50 to 70 percent Savings Plan, 15 to 25 percent Spot, 5 to 15 percent Reserved Instance, and 10 to 20 percent On-Demand. The ranges are workload-mix dependent, not arbitrary. The shape that decides the slot is whether the workload is predictable, stateful, AZ-anchored, or unpredictable. Match shape to slot, accept the operational complexity of running four models concurrently, and the saving compounds.
This post is the four models, the workload-shape mapping, and the failure mode (commitment expiry) that nobody tracks until it bites. It composes with right-sizing without vibes and closed-loop FinOps. Right-sizing picks the per-instance type. Commitment portfolio picks the per-instance pricing model. The two answers multiply.
One model for the whole fleet is the wrong answer
The single-model patterns are familiar. The cost-conscious finance team pushes everything onto a 3-year Savings Plan to maximize discount; the engineering team that values flexibility runs everything On-Demand; the bold cost-optimizer pushes batch pipelines onto Spot and forgets the rest. Each of these captures part of the saving curve and leaves the rest on the table.
The retrospective math is clean once you compute it.
| Strategy | Discount captured | Operational complexity | Risk if workload changes |
|---|---|---|---|
| All On-Demand | 0% (baseline) | Lowest | None |
| All Savings Plan (3yr) | 60-66% on covered usage | Low | High; commitment locked 3 years |
| All Spot | 70-90% on stable workloads | High; every workload needs interruption handling | Medium; price spikes still happen |
| Portfolio (mixed) | 50-65% blended | Medium-high | Low; each workload sized to its own slot |
The all-Savings-Plan strategy looks attractive on paper. It collapses when 30 percent of the fleet is unpredictable: that 30 percent runs On-Demand inside the SP umbrella anyway, and the SP commitment goes unused. The all-Spot strategy collapses on the first stateful workload that cannot tolerate interruption. The portfolio is the only strategy that does not collapse on the workload mix it was built for.
What the four models actually trade off
The four models are not interchangeable. Each has a different commitment shape, a different discount curve, and a different failure mode.
| Model | Discount vs On-Demand | Commitment | Flexibility | Best for |
|---|---|---|---|---|
| On-Demand | 0% | None | Highest; pay-per-second | Unpredictable spikes, dev/test |
| Spot | 70-90% | None; interruptible with 2-min warning | Low; runtime can vanish | Stateless batch, fault-tolerant work |
| Savings Plan (Compute) | 27-66% | 1yr or 3yr commitment to $/hour | High; covers EC2 + Fargate + Lambda, any region/family | Predictable steady-state |
| Savings Plan (EC2 Instance) | Up to 72% | 1yr or 3yr commitment to specific family | Low; locked to family | Locked-family steady-state |
| Reserved Instance | Up to 72% | 1yr or 3yr commitment to specific instance type + AZ | Lowest; locked to type and AZ | Large DBs, GPU inference with AZ pinning |
The Compute Savings Plan is the flexible default. It covers EC2, Fargate, and Lambda, any instance family, any region. The flexibility costs a few percentage points of discount versus EC2 Instance Savings Plan, but the optionality is worth it for any workload that might change instance family in the next 1 to 3 years (which, in practice, is most of them).
Reserved Instances retain a niche. Workloads that need a specific instance type and AZ for performance reasons (a large i3en.metal database that cannot move, a p4d.24xlarge GPU instance for inference latency) get the deepest discount on RI. The flexibility cost is paid for by the certainty that the workload will not migrate.
Spot is its own decision. The discount is real, but only if the workload survives interruption.
Workload shape decides the slot
The portfolio is not a finance allocation. It is a workload-shape allocation that finance reads after the fact. Four shape signals decide the slot.
| Workload shape signal | Maps to |
|---|---|
| Predictable load, runs 24/7, no AZ pinning | Compute Savings Plan |
| Stateless, can checkpoint, batch or async | Spot |
| Stateful, AZ-pinned, large DB or GPU inference | Reserved Instance |
| Unpredictable spikes, dev/test that scales to zero | On-Demand |
The mapping is deterministic once the shape is known. The signals come from existing observability: average CPU and the variance around it tell you predictable vs unpredictable; the deployment manifest tells you stateful vs stateless; the instance type at provisioning tells you AZ-pinned vs not.
A common confusion: “predictable” does not mean “same load every minute.” It means the daily and weekly shape repeats. A workload that runs at 60 percent capacity from 8am to 6pm and 10 percent overnight is predictable; the SP covers the steady-state baseline (30 percent of peak) and On-Demand absorbs the rest. A workload whose load can 10x in 90 seconds is unpredictable and stays On-Demand for the entire spike envelope.
The portfolio that beat single-model strategies
The 50/70/15/25/5/15/10/20 ranges (Savings Plan, Spot, RI, On-Demand) are not aspirational. They came from auditing about 40 mid-size AWS fleets across 3 years and computing what the optimum looked like for each, then taking the central tendency.
The interpretation:
- Savings Plan, 50 to 70 percent. The steady-state baseline is the largest single bucket because most production workloads have a steady-state baseline. Cover 70 to 80 percent of historical usage with the SP commitment, not 90 percent. The overshoot capacity goes to On-Demand for spikes.
- Spot, 15 to 25 percent. Batch pipelines, ML training, async workers, stateless rendering. The bucket size is bounded by how many workloads can absorb interruption, which is bounded by how much engineering investment you made in interruption handling.
- Reserved Instance, 5 to 15 percent. Big databases, GPU inference, anything that needs AZ affinity. Small bucket because most fleets only have a handful of these.
- On-Demand, 10 to 20 percent. The buffer above the SP commitment plus the genuinely unpredictable workloads. Treating On-Demand as the safety net for the top 10 to 20 percent of load is correct; treating it as the default for everything is the 35 percent overspend.
Teams that ran this portfolio shape in audits hit 95 percent of the maximum theoretical saving (the saving you would get if you had perfect foreknowledge of every workload’s behavior for the next 3 years). The remaining 5 percent gap is the cost of operating four models concurrently instead of one. The 95-versus-65 percent difference between portfolio and single-model is the work paying off.
Spot interruption is an engineering investment, not a finance decision
The Spot bucket is the one most teams underuse because they treat it as a finance question. It is not. It is an engineering investment that, once shipped, applies to every Spot-eligible workload the team writes for the next 5 years.
The interruption rate runs 2 to 10 percent depending on instance type and AZ. AWS gives a 2-minute warning before reclaiming the instance. Workloads that survive cleanly need three things:
- A PreStop hook that drains in-flight work, marks the queue position, and writes a checkpoint
- A checkpointing strategy (typically every 1 to 5 minutes for long jobs) so a reclaimed instance can resume on the next instance instead of restarting
- A queue-resume capability so the work item that was mid-flight when the interruption fired gets picked up by the next worker, not lost
Building these once costs maybe 2 to 4 weeks of engineer time. It then amortizes across every batch worker, every ML training job, every async pipeline you ship for the next 5 years. The Spot bucket grows from 5 percent to 25 percent of the fleet because new workloads land on Spot by default.
Teams that skip the engineering investment cap their Spot usage at 5 to 10 percent of the fleet (the fraction that happens to be naturally fault-tolerant) and leave 70 to 90 percent discount on the table for everything else. Teams that ship the investment unlock the upper end of the 15 to 25 percent range.
Commitment expiry is the failure mode nobody tracks
Three years of audit retrospectives surface the same hidden failure mode: a 1-year Savings Plan or RI expires silently and the next month’s bill jumps 40 to 50 percent. The team finds out from the finance review, not from the alert that should have fired 90 days earlier.
The pattern that prevents the surprise is the renewal calendar. Three discrete checkpoints per commitment:
| Days before expiry | Action |
|---|---|
| 90 days | Pull utilization report; flag if SP coverage dropped below 70% |
| 60 days | Decide ratchet: renew at same level, smaller, larger, or let lapse |
| 30 days | Place the renewal commitment; auto-page if not done |
The 90-day utilization check is the one teams skip and the one that costs the most. A 1-year Savings Plan bought when the fleet was 60 instances is wasteful when the fleet is now 30 instances and half the SP commitment is unused. The 90-day window is enough to ratchet down the new commitment instead of renewing at the same level.
The 30-day auto-page is the safety net. If nobody decided yet, the page goes out and forces the call. Without it, the SP lapses on a Friday, the bill jumps the following Tuesday, and the team is doing 6 weeks of forensics to understand why On-Demand spend doubled.
This is the same closed-loop pattern as auto-remediation: detect (utilization drift, expiry approaching), decide (the ratchet decision), act (place the new commitment), verify (next month’s bill stays in range). The detect-decide-act-verify shape repeats across cost domains because the underlying problem is the same: deterministic responses to deterministic signals, on a schedule that gets ahead of the failure instead of behind it.
The portfolio answer to the four-way tradeoff is not a one-time configuration. It is a continuously renewed allocation, ratcheted at every commitment expiry against current workload reality. Three years of audits say teams that operate it that way leave less than 5 percent of the saving on the table. Teams that pick one model and forget the renewal calendar leave 18 to 35 percent on the table and find out from finance.

