Vertex AI is not a service you can cost with one number. It is a platform, and its spend sprawls across more than a dozen resource types: endpoints, models, indexes and index endpoints, feature stores, online stores, datasets, metadata stores, tensorboards, Workbench notebooks, deployment resource pools, and reasoning engines, plus custom, tuning, NAS, batch-prediction, pipeline, training-pipeline, and notebook-execution jobs. Each has its own meter, and the ones that hurt are the ones with no obvious off switch.
ZopNight v2.0 now discovers and manages the full Vertex AI estate per resource, with billing-based cost and metrics on each. FinOps is the practice of attributing every cloud cost to an owner and a workload, and on a platform this wide that has to start with finding every resource, because the spend you cannot see is the spend you cannot cut. This post maps where Vertex cost accumulates and why the discovery behind it has to run live.
Vertex AI Is a Platform, So Its Cost Is Everywhere
The Vertex estate splits into two cost shapes. Standing resources bill for as long as they exist: a deployed endpoint, a running Workbench notebook, a feature store, a deployment resource pool. Transient jobs bill for the time they run: a custom training job, a tuning run, a batch-prediction pass, a pipeline. A single cost number on the Vertex line cannot tell you which shape is driving it, so it cannot tell you what to fix. Each line in Vertex AI pricing maps to one of these surfaces, not to a single “Vertex” SKU.
That ambiguity is expensive because the two shapes need opposite treatment. A standing resource needs an idle and right-size check, since it bills whether or not anyone uses it. A transient job needs run-duration costing, since it bills once and stops. Apply the wrong model and you either chase finished jobs that cost nothing more or ignore an endpoint quietly billing every hour. The same split shows up across managed data and ML services, from Databricks cost surfaces to SageMaker, and Vertex is the widest version of it.
| Vertex resource | Cost shape | Where it hides |
|---|---|---|
| Deployed endpoint | Standing, per node-hour | Always-on for low-traffic inference |
| Workbench notebook | Standing, per VM-hour | Left running after a session |
| Feature / online store | Standing | Bills whether queried or not |
| Deployment resource pool | Standing | Reserved capacity nobody released |
| Training / tuning / pipeline job | Transient, run-duration | Counted as recurring, not one-off |
The Resources With No Off Switch
The most reliable Vertex waste is the resource nobody turns off. A Workbench notebook spun up for an afternoon of exploration bills by the VM-hour until it is stopped, and it rarely is. A deployed endpoint sized for launch traffic keeps its nodes warm long after the traffic moved on. A feature store and its online serving layer bill for standing capacity regardless of query volume.
None of these throw an error or page anyone. They just accrue. The fix is to treat them like any other always-on infrastructure: detect the idle ones, right-size the over-provisioned ones, and schedule the ones that only need to run on a rhythm, the same approach behind automated cloud scheduling for non-prod. A usage-blind view never flags an endpoint at two percent utilization, which is exactly the right-sizing trap wearing an ML costume.
Jobs Are Billed by Run, Not by Month
Vertex jobs are transient. A custom training job, a tuning run, a NAS search, a batch-prediction, a pipeline, a training-pipeline, or a notebook-execution starts, runs, and ends. ZopNight costs each by rate times duration times replicas, once on completion, and shows it grouped under its job type with the run duration attached.
Costing a job by its run is the difference between a forecast you trust and one that drifts. Projecting a finished training run forward as a monthly charge inflates the bill and buries the real driver, the same failure mode as run-duration costing for SageMaker. A team that runs a nightly pipeline should see a stack of short, dated job costs, not one pipeline smeared across the month, and the replica count is what makes a distributed training run cost what it actually cost.
Discovery Has to Be Live, Not a Daily Snapshot
A platform this wide breaks the usual discovery shortcut. Cloud Asset Inventory is the obvious source, but it can lag: a resource created an hour ago, or a type the inventory does not track promptly, is invisible to a once-a-day snapshot. Invisible means uncosted, and uncosted means a launch-day endpoint can bill for hours before any tool admits it exists.
So discovery runs live. A REST provider sweeps Vertex generally-available regions and refreshes the types that Cloud Asset Inventory leaves stale or missing, and Cloud Monitoring metrics are joined back to each resource by its numeric id. The result is that a new endpoint or a just-started notebook shows up with cost attached while it still matters, not the next morning. Freshness is not a nice-to-have on a platform where the expensive resources are created and abandoned in the same afternoon.
You Cannot Right-Size What You Have Not Found
The caveat is the same one every discovery-based system carries: it is only as complete as its access and its coverage. Per-resource cost and metrics depend on the right GCP grants, and the live sweep depends on covering the regions your workloads actually run in. It works when the credential is in place and the regions are swept: every endpoint priced, every notebook flagged, every job costed by duration. It breaks when a region is missed or a permission is partial, and there the resource simply does not appear, which is the most expensive failure because nobody knows to look.
Grant the access, let the live discovery enumerate the platform, and Vertex stops being one opaque number. From there the idle endpoints, the forgotten notebooks, and the over-provisioned pools become things you right-size on a schedule, the raw input for a closed-loop remediation you can actually leave running.
