2

I am trying to deploy a application on EKS cluster version 1.23. When I executed the files, my deployment stuck in pending state. I describe the pod and find below error.

Events:
  Type     Reason             Age                 From                Message
  ----     ------             ----                ----                -------
  Normal   NotTriggerScaleUp  2m55s               cluster-autoscaler  pod didn't trigger scale-up: 2 node(s) had volume node affinity conflict
  Warning  FailedScheduling   57s (x3 over 3m3s)  default-scheduler   0/15 nodes are available: 1 node(s) were unschedulable, 14 node(s) had volume node affinity conflict.

I also followed Kubernetes Pod Warning: 1 node(s) had volume node affinity conflict this but still no luck.

The PVC file is:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ankit-eks-discovery-pvc
  namespace: ankit-eks
spec:
  storageClassName: ankit-eks-discovery
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1024M

And the storage class used for that is:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: ankit-eks-discovery
  namespace: ankit-eks
#volumeBindingMode: WaitForFirstConsumer
provisioner: kubernetes.io/aws-ebs
reclaimPolicy: Delete
allowVolumeExpansion: true
parameters:
  fsType: ext4
  type: gp2
allowedTopologies:
- matchLabelExpressions:
  - key: failure-domain.beta.kubernetes.io/zone
    values:
    - us-east-1a
    - us-east-1b
    - us-east-1c
    - us-east-1d

The PV description is:

Name:              pvc-c9f6d0d3-0348-4ff4-8d9f-e01af1996e60
Labels:            topology.kubernetes.io/region=us-east-1
                   topology.kubernetes.io/zone=us-east-1a
Annotations:       pv.kubernetes.io/migrated-to: ebs.csi.aws.com
                   pv.kubernetes.io/provisioned-by: kubernetes.io/aws-ebs
                   volume.kubernetes.io/provisioner-deletion-secret-name:
                   volume.kubernetes.io/provisioner-deletion-secret-namespace:
Finalizers:        [kubernetes.io/pv-protection]
StorageClass:      ankit-eks-discovery
Status:            Bound
Claim:             ankit-eks/ankit-eks-discovery-pvc
Reclaim Policy:    Delete
Access Modes:      RWO
VolumeMode:        Filesystem
Capacity:          1Gi
Node Affinity:
  Required Terms:
    Term 0:        topology.kubernetes.io/zone in [us-east-1a]
                   topology.kubernetes.io/region in [us-east-1]
Message:
Source:
    Type:       AWSElasticBlockStore (a Persistent Disk resource in AWS)
    VolumeID:   vol-0eb1d80b2882356b2
    FSType:     ext4
    Partition:  0
    ReadOnly:   false
Events:         <none>

I tried deleting and recreating deployment, PVC and storage class but no luck. I also checked the labels of my nodes. They are also correct.

[ankit@ankit]$ kubectl describe no ip-10-211-26-94.ec2.internal
Name:               ip-10-211-26-94.ec2.internal
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=m5d.large
                    beta.kubernetes.io/os=linux
                    eks.amazonaws.com/capacityType=ON_DEMAND
                    eks.amazonaws.com/nodegroup=ankit-nodes
                    eks.amazonaws.com/nodegroup-image=ami-0eb3216fe26784e21
                    failure-domain.beta.kubernetes.io/region=us-east-1
                    failure-domain.beta.kubernetes.io/zone=us-east-1b
                    k8s.io/cloud-provider-aws=b69ac44d98ef071c695017c202bde456
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ip-10-211-26-94.ec2.internal
                    kubernetes.io/os=linux
                    node.kubernetes.io/instance-type=m5d.large
                    topology.ebs.csi.aws.com/zone=us-east-1b
                    topology.kubernetes.io/region=us-east-1
                    topology.kubernetes.io/zone=us-east-1b
Annotations:        alpha.kubernetes.io/provided-node-ip: 10.211.26.94
                    csi.volume.kubernetes.io/nodeid: {"ebs.csi.aws.com":"i-068a872a874b02642"}
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true

I don't know what I am doing wrong. Can anyone help here.

Ankit Soni
  • 95
  • 2
  • 13
  • Your configuration looks like you allow the StorageClass to provision EBS in all AZs. The error message implies that you do not have nodes or capacity in all AZs. You could align these two and only have EBS in AZs where there are nodes. – Augunrik Feb 20 '23 at 10:56
  • 1
    Are you trying to share this PVC between multiple Pods? A `ReadWriteOnce` PVC can only be attached to one node, and if that node is full, it would produce errors similar to what you're seeing. (I'd rearchitect your application to avoid directly sharing files if possible; better still is avoiding persistent local files entirely.) – David Maze Feb 20 '23 at 12:08
  • @DavidMaze The PVC's are not shared between multiple pods. It's just one deployment who is using this PVC. How we can check that the node is full with which the PVC is attach? Also what needs to be done if we want to change the node? Thanks. – Ankit Soni Feb 21 '23 at 06:15
  • `kubectl describe node` will tell you how much of the node's capacity is used. But more generally you can hit this problem whenever more than one Pod tries to use the same (ReadWriteOnce) PVC; I'd suggest you should try to avoid including a PVC in a Deployment, but instead use a StatefulSet if you do need persistent state per Pod. – David Maze Feb 21 '23 at 10:43

0 Answers0