How to reduce CPU limits of kubernetes system resources?

Question

I'd like to keep the number of cores in my GKE cluster below 3. This becomes much more feasible if the CPU limits of the K8s replication controllers and pods are reduced from 100m to at most 50m. Otherwise, the K8s pods alone take 70% of one core.

I decided against increasing the CPU power of a node. This would be conceptually wrong in my opinion because the CPU limit is defined to be measured in cores. Instead, I did the following:

replacing limitranges/limits with a version with "50m" as default CPU limit (not necessary, but in my opinion cleaner)
patching all replication controller in the kube-system namespace to use 50m for all containers
deleting their pods
replacing all non-rc pods in the kube-system namespace with versions that use 50m for all containers

This is a lot of work and probably fragile. Any further changes in upcoming versions of K8s, or changes in the GKE configuration, may break it.

So, is there a better way?

score 13 · Answer 1 · answered Mar 25 '19 at 03:23

I have found one of the best ways to reduce the system resource requests on a GKE cluster, is to use a vertical autoscaler.

Here are the VPA definitions I have used:

apiVersion: autoscaling.k8s.io/v1beta2
kind: VerticalPodAutoscaler
metadata:
  namespace: kube-system
  name: kube-dns-vpa
spec:
  targetRef:
    apiVersion: "extensions/v1beta1"
    kind: Deployment
    name: kube-dns
  updatePolicy:
    updateMode: "Auto"

---

apiVersion: autoscaling.k8s.io/v1beta2
kind: VerticalPodAutoscaler
metadata:
  namespace: kube-system
  name: heapster-vpa
spec:
  targetRef:
    apiVersion: "extensions/v1beta1"
    kind: Deployment
    name: heapster-v1.6.0-beta.1
  updatePolicy:
    updateMode: "Initial"

---

apiVersion: autoscaling.k8s.io/v1beta2
kind: VerticalPodAutoscaler
metadata:
  namespace: kube-system
  name: metadata-agent-vpa
spec:
  targetRef:
    apiVersion: "extensions/v1beta1"
    kind: DaemonSet
    name: metadata-agent
  updatePolicy:
    updateMode: "Initial"

---

apiVersion: autoscaling.k8s.io/v1beta2
kind: VerticalPodAutoscaler
metadata:
  namespace: kube-system
  name: metrics-server-vpa
spec:
  targetRef:
    apiVersion: "extensions/v1beta1"
    kind: Deployment
    name: metrics-server-v0.3.1
  updatePolicy:
    updateMode: "Initial"

---

apiVersion: autoscaling.k8s.io/v1beta2
kind: VerticalPodAutoscaler
metadata:
  namespace: kube-system
  name: fluentd-vpa
spec:
  targetRef:
    apiVersion: "extensions/v1beta1"
    kind: DaemonSet
    name: fluentd-gcp-v3.1.1
  updatePolicy:
    updateMode: "Initial"

---

apiVersion: autoscaling.k8s.io/v1beta2
kind: VerticalPodAutoscaler
metadata:
  namespace: kube-system
  name: kube-proxy-vpa
spec:
  targetRef:
    apiVersion: "extensions/v1beta1"
    kind: DaemonSet
    name: kube-proxy
  updatePolicy:
    updateMode: "Initial"

Here is a screenshot of what it does to a kube-dns deployment.

Our kube-dns finally reserves less then 20% of our CPUs thanks! But (of course) there seem to be high default RAM requests. Everything now requests 262 MiB which makes this less usable for everything. — fiws, May 09 '19 at 13:48
I'm using a modified version on our cluster: https://github.com/ARISEChurch/autoscaler/tree/arise-tweaks It seems to function a lot better for our workloads, but your use case may vary. — Tim Smart, May 10 '19 at 19:32
Many thanks for your config @TimSmart, could you elaborate the reason behind using "auto" and sometimes "initial"? Further, have you encountered any problems with having VPA's on system resources or have you updated your config? Thanks! — lucbas, Jun 01 '20 at 19:57
@fiws vpa-recommender has [cmd flag](https://github.com/kubernetes/autoscaler/blob/vpa-release-0.9/vertical-pod-autoscaler/pkg/recommender/logic/recommender.go#L28) `--pod-recommendation-min-memory-mb` set to 250 by default. I've added the following to `deploy/recommender-deployment.yaml` to `recommender` container: `args: ["--pod-recommendation-min-cpu-millicores=5", "--pod-recommendation-min-memory-mb=40", "--v=4", "--stderrthreshold=info", "--prometheus-address=http://prometheus.monitoring.svc"]` — Denis Isaev, Dec 26 '20 at 18:44

score 12 · Accepted Answer · answered Oct 28 '15 at 15:53

Changing the default Namespace's LimitRange spec.limits.defaultRequest.cpu should be a legitimate solution for changing the default for new Pods. Note that LimitRange objects are namespaced, so if you use extra Namespaces you probably want to think about what a sane default is for them.

As you point out, this will not affect existing objects or objects in the kube-system Namespace.

The objects in the kube-system Namespace were mostly sized empirically - based on observed values. Changing those might have detrimental effects, but maybe not if your cluster is very small.

We have an open issue (https://github.com/kubernetes/kubernetes/issues/13048) to adjust the kube-system requests based on total cluster size, but that is not is not implemented yet. We have another open issue (https://github.com/kubernetes/kubernetes/issues/13695) to perhaps use a lower QoS for some kube-system resources, but again - not implemented yet.

Of these, I think that #13048 is the right way to implement what you 're asking for. For now, the answer to "is there a better way" is sadly "no". We chose defaults for medium sized clusters - for very small clusters you probably need to do what you are doing.

Matheus Portillo · Answer 3 · 2021-12-22T16:51:02.253

As stated by @Tim Hockin, The default configurations of add-ons are appropriate for typical clusters. But can be fine-tuned by changing the resource limit specification.

Before working add-on resizing, remember you can also disable unecessary add-ons for your use. This can vary a little depending on the add-on, its version, the kubernetes version, and by provider. Google has a page covering some options, the same concepts can be used in other providers too

As of the solution to the issue linked in @Tim Hockin answer, the first accepted way to do it is by using addon-resizer. It basically find out the best limits and requirements, patches the Deployment/Pod/DaemonSet and recreates the associated pods to match the new limits, but with less effort than manually all of it.

However, another more robust way to achieve that is by using Vertical Pod Autoscaler as stated by @Tim Smart answer. VPA accomplishes what addon-resizer does but it has many benefits:

VPA is a custom resource definition of a addon itself, allowing your code to be much more compact than using addon-resizer.
By being a custom resource definition it is also much easier to keep the implementation up to date.
some providers (as google) run VPA resources on control-plane processes, instead of deployments on your worker nodes. Making that, even if addon-resizer is simplier, VPA uses none of your resources while addon-resizer would.

An updated template would be:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: <addon-name>-vpa
  namespace: kube-system
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind:       <addon-kind (Deployment/DaemonSet/Pod)>
    name:       <addon-name>
  updatePolicy:
    updateMode: "Auto"

It is important to check the addons being used in your current cluster, as they can vary a lot by providers (AWS, Google, etc) and its kubernetes implementation versions

Make sure you have VPA addon installed in your cluster (most kubernetes services has it as an easy option to check)

Update policy can be Initial (only applies new limits when new pods are created), Recreate (forces pods out of spec to die and applies to new pods), Off (create recommendations but don´t apply), or Auto (currently matches Recreate, can change in the future)

The only differences on @Tim Smart answer example are that the current api version is autoscaling.k8s.io/v1, the current api version of targets is apps/v1, and that newer versions of some providers use FluentBit in place of Fluentd. His answer might be better suited for earlier kubernetes versions

If you are using Google Kubernetes Engine for example currently some of the "heaviest" requirement addons are:

fluentbit-gke (DaemonSet)
gke-metadata-server (DaemonSet)
kube-proxy (DaemonSet)
kube-dns (Deployment)
stackdriver-metadata-agent-cluster-level (Deployment)

By applying VPAs on it, my addon resource requirements dropped from 1.6 to 0.4.

Looks good, but still can't get kube-system pods to reduce request on GKE. Any suggestions for starting points to analyze my issue? — mararn1618, Jan 06 '22 at 22:24
(1) kube-dns & fluentbit-gke: looks like i need to forcefully kill these pods (2) kube-proxy: vpa cannot select the pod, since kube-proxy seems to 'directly' run on the node, not within a DaemonSet — mararn1618, Jan 06 '22 at 23:24

David Dehghan · Answer 4 · 2018-03-28T07:24:49.030

1

By the way just in case you wanted to try this on Google Cloud GCE. If you try to change the CPU limit of the core services like kube-dns you will get an error like this.

spec: Forbidden: pod updates may not change fields other than spec.containers[*].image, spec.initContainers[*].image, spec.activeDeadlineSeconds or spec.tolerations (only additions to existing tolerations

Tried on Kubernetes 1.8.7 and 1.9.4.

So at this time the minimum node you need to deploy is n1-standard-1. Also with that about 8% of your cpu is eaten almost constantly by the Kubernetes itself as soon as you have several pods and helms. even if you are not running any major load. I think there are a lot of polling going on and to make sure the cluster is responsive they keep refreshing some stats.

edited Mar 28 '18 at 07:24

answered Mar 19 '18 at 06:42

David Dehghan

22,159
10
107
95

1

So I basically need at least a single n1-standard-1 solely to manage my kubernetes? – Snowball Mar 26 '18 at 18:19
yeah basically. :-( at least on google you don't pay for the master. on AWS you have to pay for the master node yourself. – David Dehghan Mar 28 '18 at 06:15
I think on google you pay for the master as well – at least if you are using google kubernetes engine – Snowball Mar 28 '18 at 06:34
No you don't. you only pay for your worker nodes. They made the master node free some time ago. – David Dehghan Mar 28 '18 at 07:13
You are right – thanks. Wasn't properly communicated on their pricing page but I found this Blog post: https://cloudplatform.googleblog.com/2017/11/Cutting-Cluster-Management-Fees-on-Google-Kubernetes-Engine.html Anyway, you are still limited to 110 pods per node with apparently no way to increase even if you have large nodes. So it still doesn't work for me :( – Snowball Mar 28 '18 at 07:25
I doubt kubernetes will scale to that many pods per node on any platform. you will have to have a really big instance. It is not efficient at managing resources on the small scale. A load that i was running on AWS micro instance converted to GKE runs now on a n1-standard-2. – David Dehghan Mar 28 '18 at 07:31

How to reduce CPU limits of kubernetes system resources?

4 Answers4

Linked