Azure cost anomalies hide above and below the subscription line, so ZopNight now watches all three

Most Azure cost-anomaly detection runs at one level: the subscription. That feels natural, because the subscription is where budgets and ownership usually sit. It is also where the detection misses the most.

We call this the subscription blind spot. A real anomaly takes one of two shapes that a subscription-scoped detector cannot see. It is either too diffuse, spread thinly across many subscriptions, or too concentrated, buried inside one resource group. In ZopNight v1.16.0, Azure cost-anomaly detection now runs at the Resource Group and Tenant levels, not just the Subscription level. That closes the blind spot on both sides.

Azure’s hierarchy has four observable layers: Tenant, then Management Group, then Subscription, then Resource Group. Spend rolls up through all of them. If you only inspect one layer, you only catch the anomalies whose shape happens to match that layer’s granularity. The other shapes pass through. This connects directly to why a cloud bill is a control problem, not just a reporting one.

Aggregation hides tenant-wide drift

The first shape is diffuse drift. Picture a misconfigured policy, an autoscaler floor raised everywhere, or a new logging default that lands across thirty subscriptions at once. Each subscription absorbs a small increase. None of them crosses its own alert threshold.

The mechanism is averaging. A subscription detector compares each subscription against its own baseline. A 4 percent bump on a subscription with normal daily variance of 6 percent looks like noise. Multiply that across thirty subscriptions and the tenant total moves several percent, which is a clear signal. But no single subscription ever raised its hand.

Tenant-level detection fixes this because it sums first, then compares. The diffuse increase becomes one large number against one tenant baseline. The drift that hid in per-subscription noise now stands out against the aggregate. This is the same logic behind catching cost alerts three days late: the signal exists, but you were looking at the wrong scope to see it.

Azure cost anomalies hide above and below the subscription line, so ZopNight now watches all three - diagram

Subscription granularity hides per-resource-group spikes

The second shape is the opposite. One resource group inside a large subscription doubles its spend overnight. A forgotten GPU pool, a runaway batch job, a storage account stuck on the wrong tier. The resource group is screaming. The subscription barely whispers.

The mechanism here is dilution. If that resource group is 5 percent of a busy subscription, a 100 percent jump inside it moves the subscription total by 5 percent. That sits inside normal subscription variance, so the subscription detector stays quiet. The spike is real, localized, and invisible at the parent level.

Resource Group detection fixes this by comparing each resource group against its own baseline. A doubling inside one resource group is a doubling, full stop, with nothing larger to dilute it. The same pattern shows up with a hot-tier blob cost leak: the leak is loud at the resource that owns it and quiet everywhere above.

Three detection levels, three different catches

Each level catches a different anomaly shape and misses the others. That is why running one level is not a tuning choice, it is a coverage gap. The table below is the coverage map.

Detection level	Catches	Misses	Best for
Resource Group	Concentrated spikes inside one resource group	Drift spread across many resource groups or subscriptions	Forgotten resources, runaway jobs, tier misconfig
Subscription	Anomalies sized to one budget owner	Sub-threshold per-RG spikes and tenant-wide diffuse drift	Per-team or per-app budget breaches
Tenant	Diffuse drift that sums across subscriptions	Localized spikes that vanish in the aggregate	Org-wide policy and default changes

The three rows do not overlap much, and that is the point. Resource Group detection is granular and noisy. Tenant detection is aggregate and quiet. Subscription detection sits in the middle and catches neither extreme reliably. Strong tag governance at scale makes each level easier to attribute once it fires.

Run all three levels, and tune for the shape of the anomaly

The recommendation is direct: detect at Resource Group, Subscription, and Tenant levels together. Each level guards a different failure mode, so dropping one reopens the matching blind spot.

This works when your baselines have enough history at each level and your tags map resource groups to owners. It breaks when a tenant has thousands of low-traffic resource groups, because Resource Group detection then fires constantly on tiny relative swings. The fix is a higher relative threshold or an absolute floor on small resource groups, so a USD 3.00 wobble does not page anyone.

It also breaks if you treat all three levels with one threshold. Aggregate baselines are stable and tolerate tight thresholds. Fine-grained baselines are jumpy and need looser ones. Match the threshold to the level, then watch the whole tree instead of one branch of it.