aks reporting "Insufficient pods"

Question

I've gone through the Azure Cats&Dogs tutorial described here and I am getting an error in the final step where the apps are launched in AKS. Kubernetes is reporting that I have insufficent pods but I'm not sure why this would be. I've run through this same tutorial a few weeks ago without problems.

$ kubectl apply -f azure-vote-all-in-one-redis.yaml
deployment.apps/azure-vote-back created
service/azure-vote-back created
deployment.apps/azure-vote-front created
service/azure-vote-front created

$ kubectl get pods
NAME                                READY   STATUS    RESTARTS   AGE
azure-vote-back-655476c7f7-mntrt    0/1     Pending   0          6s
azure-vote-front-7c7d7f6778-mvflj   0/1     Pending   0          6s

$ kubectl get events
LAST SEEN   TYPE      REASON                 KIND         MESSAGE
3m36s       Warning   FailedScheduling       Pod          0/1 nodes are available: 1 Insufficient pods.
84s         Warning   FailedScheduling       Pod          0/1 nodes are available: 1 Insufficient pods.
70s         Warning   FailedScheduling       Pod          skip schedule deleting pod: default/azure-vote-back-655476c7f7-l5j28
9s          Warning   FailedScheduling       Pod          0/1 nodes are available: 1 Insufficient pods.
53m         Normal    SuccessfulCreate       ReplicaSet   Created pod: azure-vote-back-655476c7f7-kjld6
99s         Normal    SuccessfulCreate       ReplicaSet   Created pod: azure-vote-back-655476c7f7-l5j28
24s         Normal    SuccessfulCreate       ReplicaSet   Created pod: azure-vote-back-655476c7f7-mntrt
53m         Normal    ScalingReplicaSet      Deployment   Scaled up replica set azure-vote-back-655476c7f7 to 1
99s         Normal    ScalingReplicaSet      Deployment   Scaled up replica set azure-vote-back-655476c7f7 to 1
24s         Normal    ScalingReplicaSet      Deployment   Scaled up replica set azure-vote-back-655476c7f7 to 1
9s          Warning   FailedScheduling       Pod          0/1 nodes are available: 1 Insufficient pods.
3m36s       Warning   FailedScheduling       Pod          0/1 nodes are available: 1 Insufficient pods.
53m         Normal    SuccessfulCreate       ReplicaSet   Created pod: azure-vote-front-7c7d7f6778-rmbqb
24s         Normal    SuccessfulCreate       ReplicaSet   Created pod: azure-vote-front-7c7d7f6778-mvflj
53m         Normal    ScalingReplicaSet      Deployment   Scaled up replica set azure-vote-front-7c7d7f6778 to 1
53m         Normal    EnsuringLoadBalancer   Service      Ensuring load balancer
52m         Normal    EnsuredLoadBalancer    Service      Ensured load balancer
46s         Normal    DeletingLoadBalancer   Service      Deleting load balancer
24s         Normal    ScalingReplicaSet      Deployment   Scaled up replica set azure-vote-front-7c7d7f6778 to 1

$ kubectl get nodes
NAME                       STATUS   ROLES   AGE    VERSION
aks-nodepool1-27217108-0   Ready    agent   7d4h   v1.9.9

The only thing I can think of that has changed is that I have other (larger) clusters running now as well, and the main reason I went through this Cats&Dogs tutorial again was because I hit this same problem today with my other clusters. Is this a resources limit issue with my Azure account?

Update 10-20/3:15 PST: Notice how these three clusters all show that they use the same nodepool, even though they were created in different resource groups. Also note how the "get-credentials" call for gem2-cluster reports an error. I did have a cluster earlier called gem2-cluster which I deleted and recreated using the same name (in fact I deleted the wole resource group). What's the correct process for doing this?

$ az aks get-credentials --name gem1-cluster --resource-group gem1-rg
Merged "gem1-cluster" as current context in /home/psteele/.kube/config

$ kubectl get nodes -n gem1
NAME                       STATUS   ROLES   AGE     VERSION
aks-nodepool1-27217108-0   Ready    agent   3h26m   v1.9.11

$ az aks get-credentials --name gem2-cluster --resource-group gem2-rg
A different object named gem2-cluster already exists in clusters

$ az aks get-credentials --name gem3-cluster --resource-group gem3-rg
Merged "gem3-cluster" as current context in /home/psteele/.kube/config

$ kubectl get nodes -n gem1
NAME                       STATUS   ROLES   AGE   VERSION
aks-nodepool1-14202150-0   Ready    agent   26m   v1.9.11

$ kubectl get nodes -n gem2
NAME                       STATUS   ROLES   AGE   VERSION
aks-nodepool1-14202150-0   Ready    agent   26m   v1.9.11

$ kubectl get nodes -n gem3
NAME                       STATUS   ROLES   AGE   VERSION
aks-nodepool1-14202150-0   Ready    agent   26m   v1.9.11

Done. I see that the age here shows 7 days. That doesn't look right. — user3280383, Oct 19 '18 at 19:40
I say it doesn't look right because I just created the cluster today and I'd expect that age column to reflect that. — user3280383, Oct 19 '18 at 19:59
the error suggest the node is offline. can you check the portal to see if its actually running? — 4c74356b41, Oct 19 '18 at 20:33
The issue I believe is my confusion about the role that 'az aks get-credentials' plays. I forgot to do that command when I created my Cats&Dogs cluster and so operations were targeted to a cluster that I had created previously. I had in fact two large clusters and it does seem I've exhausted my reources. — user3280383, Oct 20 '18 at 21:58
So the question is, is there a way to create separate clusters with their own nodepools? — user3280383, Oct 20 '18 at 22:15
this is a separate question, but yes, you can do that. just create a new cluster with az aks create (or any other convenient way) — 4c74356b41, Oct 21 '18 at 06:48
Yes, that is a separate question. I consider the original answered. Thanks for the help. — user3280383, Oct 21 '18 at 13:24

Lipsum · Answer 1 · 2018-11-20T15:44:09.963

29

What is your max-pods set to? This is a normal error when you've reached the limit of pods per node.

You can check your current maximum number of pods per node with:

$ kubectl get nodes -o yaml | grep pods
  pods: "30"
  pods: "30"

And your current with:

$ kubectl get pods --all-namespaces | grep Running | wc -l
  18

edited Nov 20 '18 at 15:44

answered Nov 20 '18 at 11:11

Lipsum

536
3
6

4

worth checking without the `grep Running` command as this one caught me out. I had about 700 pods in pending state due to cron job image pull failures. Thx. – quintindk Jan 17 '19 at 06:18
Is the max number of pods per node something that we can configure? – Stephen Mar 16 '19 at 21:54
1

@Stephen Yes, but currently it's only configurable during creation of the AKS cluster. And I think only via the command line. Look at the argument "--max-pods" under: "az aks create -h". – Lipsum Apr 02 '19 at 08:41

ActionJack · Answer 2 · 2019-07-17T10:30:11.270

7

I hit this because I exceed the max pods, I found out how much I could handle by doing:

$ kubectl get nodes -o json | jq -r .items[].status.allocatable.pods | paste -sd+ - | bc

edited Jul 17 '19 at 10:30

answered Jul 10 '19 at 11:02

ActionJack

71
1
3

2

kubectl get nodes -o json and locating items[].status.allocatable.pods worked. It was 4 and 4 pods were already running for system namespace. – inaitgaJ Sep 06 '19 at 07:03

score 0 · Answer 3 · answered Oct 19 '18 at 20:23

0

Check to make sure you are not hitting core limits for your subscription.

az vm list-usage --location "<location>" -o table

If you are you can request more quota, https://learn.microsoft.com/en-us/azure/azure-supportability/resource-manager-core-quotas-request

answered Oct 19 '18 at 20:23

Ken W - Zero Networks

3,533
1
13
18

i dont think it makes sense. as the worker node is online, quota limit doesnt matter – 4c74356b41 Oct 19 '18 at 20:30

aks reporting "Insufficient pods"

3 Answers3

Linked