Google Cloud Quota Miscalculation Preventing Kubernetes Pods from Scaling

Question

I am currently facing an issue with a Kubernetes configuration on my cluster running in Google Kubernetes Engine in Autopilot mode in the us-west1 region. The configuration requires 40 replicas, each with a CPU limit of 1000m. I have an Nginx load balancer with an external IP that distributes load to these pods, and its CPU limit is 250m.

However, when I attempt to deploy this configuration, only 26 pods are created, and the remaining 14 remain in Unschedulable status. On the cluster page, I see two warnings: "Can't scale up nodes" and "Pods unschedulable."

Upon checking the quota page, I discovered that Google is calculating my current usage incorrectly. Although I am using 26.25 CPUs, Google shows the current usage as 64. Additionally, while there are 27 pods in total, Google calculates it as 32.

Here is the screenshot from quotas page:

This miscalculation by Google is preventing my pods from scaling, and I am unsure how to resolve this issue. Can anyone offer guidance on how to avoid this situation?

score 3 · Accepted Answer · answered Apr 06 '23 at 08:15

Even though Autopilot handles node management for you, behind the scenes it is still creating nodes which count against your CPU quota. While you only pay for the CPU/Memory requested by your pods, the nodes which are spun up behind the scenes actually use more CPU/Memory than that as they also run system pods which you don't pay for. Autopilot tends to provision smaller nodes to optimize for scale down without disrupting workloads.

So in your case what is happening is that Autopilot is provisioning nodes which each use 2 vCPUs which results in 32 nodes being provisioned. You can see the nodes / number of nodes using kubectl get nodes|wc -l (well technically this command will be # of nodes + 1). This results in 64 vCPUs being used, which is why you are hitting the CPU quota. Additionally, seems like your Autopilot cluster is actually a public cluster, which results in a public IP being assigned to each of the 32 nodes and that's how you hit the in-use IP address quota.

To avoid the in-use IP address quota, you should create a private Autopilot cluster. Unfortunately, the best way to do this would be to create a brand new Autopilot cluster. If you are unable to create a new cluster, then you'll need to request a quota increase for in-use IP addresses (64 should probably be enough). But I'd highly recommend creating a new private cluster if at all possible.

To resolve issues with CPU quota, I'd recommend requesting double what you expect your total request/limits to be and rounding to the nearest power of 2, which in your case I'd suggest something like 128 vCPUs. You'll need to make sure that your total CPU quota (the one in your image) and your E2 CPU quota are both set (your E2 default quota is probably fine).

The explanation is reasonable, thank you. I think, on the quota page, it would be more accurate to see the resources used by my application, not the autopilot. So I think I should have abstracted from the use of autopilot. Even if I have to see the total, how much resources I use and how much resources autopilot uses should be given separately. — Süleyman Gezsat, Apr 08 '23 at 14:30

Google Cloud Quota Miscalculation Preventing Kubernetes Pods from Scaling

1 Answers1