How is 'watch=true' implemented on the kube-apiserver?

Question

When watching kubernetes resources for changes, what exactly is happening under the hood? Does the http suddenly change to a wss connection?

To solve a problem of too many requests to the kube-apiserver I am rewriting some code to what I think is more of an operator pattern.

In our multi-tenant microservice architecture all services use the same library to look up connection details to tenant-specific DBs. The connection details are saved in secrets within the same namespace as the application. Every tenant DB has its own secret.

So on every call all secrets with the correct label are read and parsed for the necessary DB connection details. We have around 400 services/pods...

My idea: instead of reading all secrets on every call, create a cache and update the cache everytime a relevant secret was changed via a watcher.

My concerns: am I just replacing the http requests with equally expensive websockets? As I understand I will now have an open websocket connection for every service/pod, which still is 400 open connections.

Would it be better to have a proxy service to watch the secrets (kube-apiserver requests) and then all services query that service for connection details (intranet requests, kube-apiserver unreleated)?

I guess, you can also add masters to your cluster and scale etcd. This way requests to apiserver will spread across several hosts. — VAS, Jan 15 '20 at 06:47
In this case control plane should scale automatically. I guess there is some kind of proportion between number of nodes and number of masters. Probably you may try to switch to regional cluster or use more lightweight nodes instead of couple big ones. https://cloud.google.com/blog/products/gcp/with-google-kubernetes-engine-regional-clusters-master-nodes-are-now-highly-available https://learnk8s.io/kubernetes-node-size — VAS, Jan 15 '20 at 06:55
Further research tells that GKE master can only be resized vertically at the moment, so regional cluster is the only option to get three master nodes in GKE. Nevertheless increasing number of nodes may help you to get more powerfull master. https://stackoverflow.com/questions/50425198/how-can-i-increase-the-size-of-master-node-on-google-kubernetes-engine — VAS, Jan 15 '20 at 07:17
Thanks. That is exactly our problem. We are using a zonal cluster (to save costs...). Now we are working on moving the cluster again to handle this situation. However, my post is also about the idea of limiting calls to the apiserver in general. Or should the apiserver be able to handle all theses requests at ease? — Moritz Schmitz v. Hülst, Jan 15 '20 at 07:21
This article may shed some light on this area: https://openai.com/blog/scaling-kubernetes-to-2500-nodes/ You may also want to try independent solution for connection details sharing, like NFS/SMB mounts or cluster FS like Ceph/Glusterfs, or even separate ETCD cluster for each namespace. — VAS, Jan 15 '20 at 07:41
Cool stuff, many thanks! From the crowd who beat Dota 2 pro players ;-). I have an additional question though: how to measure any of the solutions? In stackdriver I can monitor most kube-apiserver requests except list/get. I haven't found a good way of measuring API requests yet though, or general kube-apiserver load. — Moritz Schmitz v. Hülst, Jan 15 '20 at 09:27
I think you can also measure the API response delay from kube-apiserver. I would do that for read and write requests, e.g read and write config-map content respectively. — VAS, Jan 15 '20 at 09:34

score 4 · Accepted Answer · answered Jan 14 '20 at 15:45

From the sources:

// ServeHTTP serves a series of encoded events via HTTP with Transfer-Encoding: chunked
// or over a websocket connection.

It pretty much depends on the client which protocol is used (either chunked http or ws), both of them having their cost, which you'll have to compare to your current request frequency.

You may be better of with a proxy cache that either watches or polls in regular intervals, but that depends a lot on your application.

I am using the fabric8 kubernetes client. So have to check what they use. — Moritz Schmitz v. Hülst, Jan 15 '20 at 06:49

How is 'watch=true' implemented on the kube-apiserver?

1 Answers1