Kubernetes v1.36: How Server-Side Sharded List and Watch Scales Your Controllers
As Kubernetes clusters grow beyond tens of thousands of nodes, controllers that watch high-cardinality resources like Pods face a scalability wall. Every replica of a horizontally scaled controller receives the full event stream from the API server, consuming CPU, memory, and network bandwidth—only to discard objects it isn't responsible for. Scaling out the controller only multiplies the cost. Kubernetes v1.36 introduces server-side sharded list and watch (KEP-5866) as an alpha feature, allowing the API server to filter events at the source so each replica receives only its assigned slice of the resource collection.
What is the scaling problem with watch-heavy controllers in large clusters?
In large Kubernetes clusters, controllers like kube-state-metrics watch resources such as Pods, which can number in the hundreds of thousands. Each replica of a horizontally scaled controller receives the full event stream from the API server, even though it is only responsible for a portion of those objects. This means every replica deserializes and processes every event—wasting CPU and memory on objects it must discard. Network bandwidth also scales linearly with the number of replicas, not with the actual data needed. The inefficiency grows with cluster size: more replicas mean more total load, not less per replica, because the API server sends the same full stream to each one.
Why doesn't client-side sharding solve the data volume problem?
Some controllers already implement client-side sharding, where each replica is assigned a portion of the keyspace and discards objects that don't belong to it. While this works functionally, it does not reduce the volume of data flowing from the API server. Every replica still receives the complete event stream: it deserializes, processes, and then throws away most of it. The cost of network bandwidth, CPU for deserialization, and memory for temporary storage all scale with the number of replicas, not with shard size. Essentially, client-side sharding shifts the filtering burden to the client but doesn't eliminate the wasted bandwidth and processing on the server side. Server-side sharding fixes this by moving filtering upstream into the API server.
What is server-side sharded list and watch in Kubernetes v1.36?
Kubernetes v1.36 introduces server-side sharded list and watch as an alpha feature (KEP-5866). It adds a shardSelector field to ListOptions. Clients specify a deterministic 64-bit FNV-1a hash range using the shardRange() function, such as shardRange(object.metadata.uid, '0x0000000000000000', '0x8000000000000000'). The API server computes the hash of the specified field (currently object.metadata.uid or object.metadata.namespace) for each object and returns only those whose hash falls within the [start, end) interval. This applies to both list responses and watch event streams. The hash is deterministic across all API server instances, making the feature safe in multi-replica deployments.
How do controller replicas use sharded watches with informers?
Controllers typically use informers to list and watch resources. To shard the workload, each replica injects the shardSelector into the ListOptions used by its informer via WithTweakListOptions. For example, a 2-replica deployment splits the hash space in half: replica 0 uses shardRange(object.metadata.uid, '0x0000000000000000', '0x8000000000000000') and replica 1 uses shardRange(object.metadata.uid, '0x8000000000000000', '0x0000000000000000') (wrapping around). The code snippet in the official documentation shows how to create an informer factory with WithTweakListOptions that sets opts.ShardSelector. Each replica then only receives events for objects in its assigned range, drastically reducing per-replica data volume.
What are the benefits of server-side sharding vs. client-side sharding?
Server-side sharding moves the filtering from the client to the API server, providing several key benefits: Network bandwidth scales with shard size rather than with the number of replicas, because each replica receives only its slice of data. CPU and memory on controller replicas are reduced since they no longer deserialize and process irrelevant events. Wasted processing is eliminated—the API server sends only matching events. Additionally, the API server can optimize retrieval of only the needed objects, unlike client-side where the entire database is scanned. This allows horizontal scaling of controllers to remain efficient even in clusters with hundreds of thousands of objects. The feature is also compatible with multiple API server instances since the hash function is deterministic.
Related Discussions