Skip to main content
eBPF Gives Kubernetes Full Network Visibility Without the Sidecar CPU Tax

eBPF Gives Kubernetes Full Network Visibility Without the Sidecar CPU Tax

Istio sidecars cost 0.5 vCPU per pod at idle. At 100 pods, you're paying for 50 idle vCPUs. eBPF moves observability into the kernel — one hook point per node, not per pod. Here's the architecture, the tools, and when you still need Envoy.

Riya Mittal By Riya Mittal
Published: April 27, 2026 9 min read

Service mesh adoption in Kubernetes hit a wall in 2025. Not because teams stopped wanting visibility, but because the bill arrived. Every pod in an Istio cluster runs two containers: your application and an Envoy proxy. That proxy consumes 0.5 vCPU and 50MB of memory at idle, before a single request arrives. At 100 pods, you are paying for 50 idle vCPUs that do no application work. At 500 pods, you are running 4-5 extra EC2 nodes purely to keep proxies alive.

eBPF solves this by moving observability out of the pod and into the kernel. One program runs per node and observes all pod traffic from a single hook point. The pod count becomes irrelevant. That architectural shift is what makes tools like Cilium, Hubble, and Pixie compelling in 2026.

The Sidecar Overhead Is Not an Edge Case: It Is the Default

Envoy proxy overhead is not a bug or a misconfiguration. It is the intended architecture. Every pod in an Istio mesh gets a sidecar injected by a mutating admission webhook before scheduling. That sidecar intercepts all inbound and outbound traffic for the pod, which is how Istio gets its mTLS, retries, and observability.

The overhead has two components. The first is baseline idle consumption: 0.5 vCPU and 50MB per sidecar. This does not disappear when your pods are idle. The second is load-proportional overhead. Under production traffic, Envoy processing the request path consumes an additional 15-30% of the pod’s allocated CPU. A service that normally uses 500m CPU will peak at 575-650m with sidecar overhead factored in.

Sidecar Architecture Overhead

The startup cost is also real. Sidecar injection adds 1-2 seconds to pod startup time because the admission webhook must run, the sidecar image must be pulled (or found in cache), and the container must initialize before traffic is accepted. In clusters with aggressive HPA scaling events, this latency compounds.

The crash coupling is the least-discussed cost. If the Envoy sidecar OOMKills because your pod’s memory limit was sized for just the application, your application loses network connectivity entirely, even if it is running fine. The pod gets restarted. That is a service interruption caused by your observability layer.

Cluster SizeEnvoy Baseline vCPUAWS m5.4xlarge EquivalentMonthly Cost (on-demand)
50 pods25 vCPU1.6 nodes~830 USD
100 pods50 vCPU3.1 nodes~1,660 USD
500 pods250 vCPU15.6 nodes~8,300 USD
1,000 pods500 vCPU31.3 nodes~16,600 USD

These numbers use the 0.5 vCPU idle baseline per Envoy sidecar and on-demand m5.4xlarge pricing at 0.768 USD/hour. Actual load-time overhead pushes real cost 30-50% higher.

eBPF Observes From the Kernel: One Hook Point Per Node, Not Per Pod

eBPF (Extended Berkeley Packet Filter) is a kernel subsystem that lets you attach verified programs to hook points inside the Linux kernel. Those hook points include network ingress/egress (XDP and Traffic Control), system calls (kprobes and tracepoints), and user-space function calls (uprobes). The key property: the program runs once per node, not once per pod.

This is what the Kernel Observability Model describes. Instead of placing an observer inside each workload (the sidecar approach), you attach an observer at the substrate that all workloads run on. Traffic between pods never leaves kernel space before being inspected. No packet copy, no userspace round-trip, no per-pod process startup.

The Linux kernel verifier checks every eBPF program before it loads. It proves the program terminates, cannot access out-of-bounds memory, and cannot crash the kernel. This is why eBPF is safe for production use: the kernel itself is the safety boundary, not sandboxing in userspace.

eBPF Kernel Hook Points

The result: a 500-pod cluster running Cilium uses roughly 2-3% of a single vCPU per node for eBPF datapath processing. The entire observability layer is measured in millicores per node, not 250 idle vCPUs spread across the cluster.

Cilium and Hubble Replace Your Mesh Observability Layer

Cilium is a CNI plugin that replaces kube-proxy and iptables with an eBPF-native dataplane. On large clusters where iptables rule counts exceed 10,000, Cilium reduces per-connection latency by up to 40% because eBPF hash table lookups scale in O(1) versus iptables linear traversal.

Hubble is Cilium’s observability layer. It attaches to the same eBPF hook points and gives you per-flow visibility at L3 (IP), L4 (TCP/UDP port), and L7 (HTTP path, gRPC method, DNS query, Kafka topic). Every connection between pods is recorded: source pod, destination pod, protocol, latency, and outcome. This covers what Kiali and Jaeger give you in an Istio mesh, without a proxy in the path.

Cilium and Hubble Architecture

Cilium also enforces L7 network policy without a proxy. You can write a policy that allows only GET /api/health from service A to service B, and blocks everything else at the kernel level. Kubernetes native NetworkPolicy only operates at L4. This matters for cloud governance enforcement because policy coverage extends deeper into the protocol stack.

CapabilityEnvoy / IstioCilium / HubblePixie
L3/L4 flow visibilityYesYesYes
L7 HTTP/gRPC inspectionYesYesYes
DNS query visibilityYesYesPartial
mTLS between podsYesWireGuard (1.14+)No
Application flame graphsNoNoYes
SQL query captureNoNoYes
CPU overhead at 100 pods~50 vCPU (idle)~0.3 vCPU (idle)~0.5 vCPU (idle)
Requires code changesNoNoNo
Per-pod process injectionYesNoNo

Pixie Profiles Your Application Without a Single Line of Instrumentation

Pixie uses eBPF uprobes to hook into application-level function calls without modifying application code. When Pixie loads onto a node, it attaches uprobes to Go, Java, Python, and Node.js runtime functions. It captures HTTP request/response pairs, database queries (PostgreSQL, MySQL, Redis), gRPC calls, and CPU flame graphs.

The CPU overhead at Verizon Media’s production deployment measured under 2% of total cluster CPU. Pixie stores this telemetry locally on each node and retains roughly 1 hour of full-fidelity data. For longer retention, it exports to a Vizier-compatible backend (the Pixie-hosted cloud or a self-hosted instance).

The practical use case: a latency spike at 2am triggers an alert. You open Pixie, query the node that handled the spike, and see the exact SQL query that took 4.3 seconds instead of the usual 12ms. No log correlation, no distributed trace assembly, no context switching between four dashboards. The eBPF uprobe captured the full query text and latency in kernel space before the application even returned the response.

This works because uprobes fire synchronously at the function call boundary. The application cannot complete the function without the uprobe handler executing first. The data is captured at the source, in kernel space, with microsecond precision.

When You Should Still Keep Your Sidecar

eBPF tools have real gaps. Understanding them prevents you from migrating off Istio and losing capabilities you depend on.

mTLS with per-service certificates is the strongest holdout. Cilium 1.14 added WireGuard-based node-to-node encryption. This encrypts all inter-node traffic transparently, which covers most mTLS use cases. But it does not give you per-service certificates signed by a workload identity system like SPIFFE/SPIRE. If your compliance posture requires individual certificate rotation per service identity, Istio with Envoy still owns this.

Complex traffic policies are the second holdout. Envoy has 8 years of traffic shaping primitives: per-route retries, circuit breakers, fault injection, header manipulation, Lua and Wasm filter chains. Cilium can enforce L7 allow/deny policy, but it does not implement configurable retry logic or circuit breaker state machines at the eBPF layer. If your platform engineering team has built golden path templates around Istio VirtualService and DestinationRule objects, replacing those requires rebuilding the traffic management layer, not just swapping the CNI.

CapabilityeBPF-Native TodayStill Needs Envoy
L7 allow/deny network policyCilium
Per-flow observabilityCilium + Hubble
Application profiling, SQL capturePixie
Node-level encryptionCilium WireGuard
Per-service mTLS + SPIFFE certificatesIstio + Envoy
Configurable retries + circuit breakersEnvoy
Wasm / Lua filter chainsEnvoy
Multi-cluster east-west routingPartial (Cilium Cluster Mesh)Istio + Envoy

This works when: your primary need is observability and policy enforcement, and you can tolerate node-level rather than service-level encryption. It breaks when: your security team requires individual SPIFFE certificate rotation, or when your services rely on Envoy traffic shaping primitives that have no eBPF equivalent yet.

Migrating Off Sidecars Without Losing Coverage

The migration runs in four phases. Rushing any phase means a gap in observability during the transition.

Phase 1: Install Cilium as CNI. If you are running kube-proxy, enable Cilium in kube-proxy replacement mode. This is a rolling node replacement: cordon a node, drain it, reinstall with Cilium CNI, uncordon. No application changes. Cilium takes over iptables-based routing immediately. At this point, you have eBPF networking but Istio sidecars are still injecting.

Phase 2: Enable Hubble. Hubble deploys as a DaemonSet. It connects to the Cilium agent already running on each node and starts exporting flows. Point Hubble Relay at your existing Prometheus/Grafana stack. At this point, Hubble flow data and Istio telemetry are both available. You can validate that Hubble coverage matches Istio coverage before disabling the sidecars.

Phase 3: Install Pixie. Pixie’s edge module deploys as a DaemonSet. It loads eBPF uprobes for your language runtimes. Validate that Pixie is capturing the same request latency, error rate, and database queries you were getting from Jaeger traces.

Phase 4: Disable Istio injection namespace by namespace. Remove the istio-injection: enabled label from one namespace. Let it run for 72 hours and compare Hubble + Pixie coverage against the Istio-era dashboards. If coverage is equivalent, proceed to the next namespace. Roll back by re-adding the label and restarting pods.

Migration Phases: Cilium to eBPF-Native

The metric to track at each phase: flow coverage percentage. Measure the percentage of pod-to-pod connections visible in Hubble versus the same connections visible in Istio’s telemetry before migration. Target 100% match before proceeding. If you see gaps, check that Cilium’s eBPF programs loaded correctly on all nodes (cilium status will show missing programs).

On a 100-pod cluster, completing this migration recovers approximately 50 vCPUs of idle compute. At AWS m5.4xlarge on-demand pricing, that is 1,660 USD per month returned to your application workloads. Clusters with well-tagged cost attribution will see this recovery reflected immediately in their per-namespace cost reports.

The sidecar model made sense when eBPF was not mature enough for production observability. That changed. Cilium is a CNCF graduated project. Pixie runs at scale at dozens of companies. Hubble gives you L7 visibility that Kiali provides, without the proxy. The question in 2026 is not whether eBPF can replace your mesh observability. It is whether you have scheduled the migration yet.

Riya Mittal

Written by

Riya Mittal Author

Engineer at Zop.Dev

ZopDev Resources

Stay in the loop

Get the latest articles, ebooks, and guides
delivered to your inbox. No spam, unsubscribe anytime.