GKE Autopilot Guardian is incompatible with cpu resource requests in chart

Question

I have a GKE private cluster in autopilot mode running gke1.23, described below. I am trying to install an application from a vendor's helm chart, following their instructions, I use a script like this:

#! /bin/bash
helm repo add safesoftware https://safesoftware.github.io/helm-charts/
helm repo update
tag="2021.2"
version="safesoftware/fmeserver-$tag"

helm upgrade --install \
    fmeserver   \
    $version  \
    --set fmeserver.image.tag=$tag \
    --set deployment.hostname="REDACTED" \
    --set deployment.useHostnameIngress=true \
    --set deployment.tlsSecretName="my-ssl-cert" \
    --namespace ingress-nginx --create-namespace \
    #--set resources.core.requests.cpu="500m" \
    #--set resources.queue.requests.cpu="500m" \

However, I get errors from the GKE Warden!

Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "ingress-nginx" chart repository
...Successfully got an update from the "safesoftware" chart repository
Update Complete. ⎈Happy Helming!⎈
W1201 10:25:08.117532   29886 warnings.go:70] Autopilot increased resource requests for Deployment ingress-nginx/engine-standard-group to meet requirements. See http://g.co/gke/autopilot-resources.
W1201 10:25:08.201656   29886 warnings.go:70] Autopilot increased resource requests for StatefulSet ingress-nginx/fmeserver-postgresql to meet requirements. See http://g.co/gke/autopilot-resources.
W1201 10:25:08.304755   29886 warnings.go:70] Autopilot increased resource requests for StatefulSet ingress-nginx/core to meet requirements. See http://g.co/gke/autopilot-resources.
W1201 10:25:08.392965   29886 warnings.go:70] Autopilot increased resource requests for StatefulSet ingress-nginx/queue to meet requirements. See http://g.co/gke/autopilot-resources.
W1201 10:25:08.480421   29886 warnings.go:70] Autopilot increased resource requests for StatefulSet ingress-nginx/websocket to meet requirements. See http://g.co/gke/autopilot-resources.

Error: UPGRADE FAILED: cannot patch "core" with kind StatefulSet: admission webhook "gkepolicy.common-webhooks.networking.gke.io" denied the request: GKE Warden rejected the request because it violates one or more policies: {"[denied by autogke-pod-limit-constraints]":["workload 'core' cpu requests '{{400 -3} {\u003cnil\u003e}  DecimalSI}' is lower than the Autopilot minimum required of '{{500 -3} {\u003cnil\u003e} 500m DecimalSI}' for using pod anti affinity. Requested by user: 'REDACTED', groups: 'system:authenticated'."]} && cannot patch "queue" with kind StatefulSet: admission webhook "gkepolicy.common-webhooks.networking.gke.io" denied the request: GKE Warden rejected the request because it violates one or more policies: {"[denied by autogke-pod-limit-constraints]":["workload 'queue' cpu requests '{{250 -3} {\u003cnil\u003e}  DecimalSI}' is lower than the Autopilot minimum required of '{{500 -3} {\u003cnil\u003e} 500m DecimalSI}' for using pod anti affinity. Requested by user: 'REDACTED', groups: 'system:authenticated'."]}

So I modified the cpu requests in the resource spec for the pods causing the issues, one way is to uncomment the last two lines of the script.

    --set resources.core.requests.cpu="500m" \
    --set resources.queue.requests.cpu="500m" \

This lets me install or upgrade the chart but then I get PodUnschedulable, Reason Cannot schedule pods: Insufficient cpu. Depending on the exact changes to the chart, I sometimes also see Cannot schedule pods: node(s) had volume node affinity conflict.

I can't see how to increase either the number of pods or size of each (e2-medium) node in autopilot mode. Nor can I find a way to remove those guards. I have checked the quotas and can't see any quota issue. I can install other workloads, including ingress-nginx.

I am not sure what the issue is and I am not a expert with helm or Kubernetes.

For reference, the cluster can be described as:

addonsConfig:
  cloudRunConfig:
    disabled: true
    loadBalancerType: LOAD_BALANCER_TYPE_EXTERNAL
  configConnectorConfig: {}
  dnsCacheConfig:
    enabled: true
  gcePersistentDiskCsiDriverConfig:
    enabled: true
  gcpFilestoreCsiDriverConfig:
    enabled: true
  gkeBackupAgentConfig: {}
  horizontalPodAutoscaling: {}
  httpLoadBalancing: {}
  kubernetesDashboard:
    disabled: true
  networkPolicyConfig:
    disabled: true
autopilot:
  enabled: true
autoscaling:
  autoprovisioningNodePoolDefaults:
    imageType: COS_CONTAINERD
    management:
      autoRepair: true
      autoUpgrade: true
    oauthScopes:
    - https://www.googleapis.com/auth/devstorage.read_only
    - https://www.googleapis.com/auth/logging.write
    - https://www.googleapis.com/auth/monitoring
    - https://www.googleapis.com/auth/service.management.readonly
    - https://www.googleapis.com/auth/servicecontrol
    - https://www.googleapis.com/auth/trace.append
    serviceAccount: default
    upgradeSettings:
      maxSurge: 1
      strategy: SURGE
  autoscalingProfile: OPTIMIZE_UTILIZATION
  enableNodeAutoprovisioning: true
  resourceLimits:
  - maximum: '1000000000'
    resourceType: cpu
  - maximum: '1000000000'
    resourceType: memory
  - maximum: '1000000000'
    resourceType: nvidia-tesla-t4
  - maximum: '1000000000'
    resourceType: nvidia-tesla-a100
binaryAuthorization: {}
clusterIpv4Cidr: 10.102.0.0/21
createTime: '2022-11-30T04:47:19+00:00'
currentMasterVersion: 1.23.12-gke.100
currentNodeCount: 7
currentNodeVersion: 1.23.12-gke.100
databaseEncryption:
  state: DECRYPTED
defaultMaxPodsConstraint:
  maxPodsPerNode: '110'
endpoint: REDACTED
id: REDACTED
initialClusterVersion: 1.23.12-gke.100
initialNodeCount: 1
instanceGroupUrls: REDACTED
ipAllocationPolicy:
  clusterIpv4Cidr: 10.102.0.0/21
  clusterIpv4CidrBlock: 10.102.0.0/21
  clusterSecondaryRangeName: pods
  servicesIpv4Cidr: 10.103.0.0/24
  servicesIpv4CidrBlock: 10.103.0.0/24
  servicesSecondaryRangeName: services
  stackType: IPV4
  useIpAliases: true
labelFingerprint: '05525394'
legacyAbac: {}
location: europe-west3
locations:
- europe-west3-c
- europe-west3-a
- europe-west3-b
loggingConfig:
  componentConfig:
    enableComponents:
    - SYSTEM_COMPONENTS
    - WORKLOADS
loggingService: logging.googleapis.com/kubernetes
maintenancePolicy:
  resourceVersion: 93731cbd
  window:
    dailyMaintenanceWindow:
      duration: PT4H0M0S
      startTime: 03:00
masterAuth:
masterAuthorizedNetworksConfig:
  cidrBlocks:
  enabled: true
monitoringConfig:
  componentConfig:
    enableComponents:
    - SYSTEM_COMPONENTS
monitoringService: monitoring.googleapis.com/kubernetes
name: gis-cluster-uat
network: geo-nw-uat
networkConfig:
nodeConfig:
  diskSizeGb: 100
  diskType: pd-standard
  imageType: COS_CONTAINERD
  machineType: e2-medium
  metadata:
    disable-legacy-endpoints: 'true'
  oauthScopes:
  - https://www.googleapis.com/auth/devstorage.read_only
  - https://www.googleapis.com/auth/logging.write
  - https://www.googleapis.com/auth/monitoring
  - https://www.googleapis.com/auth/service.management.readonly
  - https://www.googleapis.com/auth/servicecontrol
  - https://www.googleapis.com/auth/trace.append
  serviceAccount: default
  shieldedInstanceConfig:
    enableIntegrityMonitoring: true
    enableSecureBoot: true
  workloadMetadataConfig:
    mode: GKE_METADATA
nodePoolAutoConfig: {}
nodePoolDefaults:
  nodeConfigDefaults:
    loggingConfig:
      variantConfig:
        variant: DEFAULT
nodePools:
- autoscaling:
    autoprovisioned: true
    enabled: true
    maxNodeCount: 1000
  config:
    diskSizeGb: 100
    diskType: pd-standard
    imageType: COS_CONTAINERD
    machineType: e2-medium
    metadata:
      disable-legacy-endpoints: 'true'
    oauthScopes:
    - https://www.googleapis.com/auth/devstorage.read_only
    - https://www.googleapis.com/auth/logging.write
    - https://www.googleapis.com/auth/monitoring
    - https://www.googleapis.com/auth/service.management.readonly
    - https://www.googleapis.com/auth/servicecontrol
    - https://www.googleapis.com/auth/trace.append
    serviceAccount: default
    shieldedInstanceConfig:
      enableIntegrityMonitoring: true
      enableSecureBoot: true
    workloadMetadataConfig:
      mode: GKE_METADATA
  initialNodeCount: 1
  instanceGroupUrls:
  locations:
  management:
    autoRepair: true
    autoUpgrade: true
  maxPodsConstraint:
    maxPodsPerNode: '32'
  name: default-pool
  networkConfig:
    podIpv4CidrBlock: 10.102.0.0/21
    podRange: pods
  podIpv4CidrSize: 26
  selfLink: REDACTED
  status: RUNNING
  upgradeSettings:
    maxSurge: 1
    strategy: SURGE
  version: 1.23.12-gke.100
- autoscaling:
    autoprovisioned: true
    enabled: true
    maxNodeCount: 1000
  config:
    diskSizeGb: 100
    diskType: pd-standard
    imageType: COS_CONTAINERD
    machineType: e2-standard-2
    metadata:
      disable-legacy-endpoints: 'true'
    oauthScopes:
    - https://www.googleapis.com/auth/devstorage.read_only
    - https://www.googleapis.com/auth/logging.write
    - https://www.googleapis.com/auth/monitoring
    - https://www.googleapis.com/auth/service.management.readonly
    - https://www.googleapis.com/auth/servicecontrol
    - https://www.googleapis.com/auth/trace.append
    reservationAffinity:
      consumeReservationType: NO_RESERVATION
    serviceAccount: default
    shieldedInstanceConfig:
      enableIntegrityMonitoring: true
      enableSecureBoot: true
    workloadMetadataConfig:
      mode: GKE_METADATA
  instanceGroupUrls:
  locations:
  management:
    autoRepair: true
    autoUpgrade: true
  maxPodsConstraint:
    maxPodsPerNode: '32'
  name: nap-1rrw9gqf
  networkConfig:
    podIpv4CidrBlock: 10.102.0.0/21
    podRange: pods
  podIpv4CidrSize: 26
  selfLink: REDACTED
  status: RUNNING
  upgradeSettings:
    maxSurge: 1
    strategy: SURGE
  version: 1.23.12-gke.100
notificationConfig:
  pubsub: {}
privateClusterConfig:
  enablePrivateNodes: true
  masterGlobalAccessConfig:
    enabled: true
  masterIpv4CidrBlock: 192.168.0.0/28
  peeringName: gke-nf69df7b6242412e9932-582a-f600-peer
  privateEndpoint: 192.168.0.2
  publicEndpoint: REDACTED
releaseChannel:
  channel: REGULAR
resourceLabels:
  environment: uat
selfLink: REDACTED
servicesIpv4Cidr: 10.103.0.0/24
shieldedNodes:
  enabled: true
status: RUNNING
subnetwork: redacted
verticalPodAutoscaling:
  enabled: true
workloadIdentityConfig:
  workloadPool: REDACTED
zone: europe-west3

EDIT Adding pod describe logs.

kubectl describe pod core -n ingress-nginx

...
Events:
  Type     Reason     Age                        From     Message
  ----     ------     ----                       ----     -------
  Warning  Unhealthy  6m49s (x86815 over 3d22h)  kubelet  Readiness probe failed: HTTP probe failed with statuscode: 500
  Warning  BackOff    110s (x13994 over 3d23h)   kubelet  Back-off restarting failed container

kubectl describe pod queue -n ingress-nginx

...
 Events:
  Type     Reason             Age                        From                                   Message
  ----     ------             ----                       ----                                   -------
  Normal   NotTriggerScaleUp  9m29s (x18130 over 2d14h)  cluster-autoscaler                     pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 node(s) didn't match pod affinity rules, 3 node(s) had volume node affinity conflict
  Normal   NotTriggerScaleUp  4m28s (x24992 over 2d14h)  cluster-autoscaler                     pod didn't trigger scale-up (it wouldn't fit if a new node is added): 3 node(s) had volume node affinity conflict, 2 node(s) didn't match pod affinity rules
  Warning  FailedScheduling   3m33s (x3385 over 2d14h)   gke.io/optimize-utilization-scheduler  0/7 nodes are available: 1 node(s) had volume node affinity conflict, 6 Insufficient cpu.

Hmm - Autopilot should automatically create the nodes needed for your deployments. Looks like there's a deployment `core` and `queue`? Can you check the logs for their pods by showing the last part of the output of `kubectl describe pod ...`? — Gari Singh, Dec 03 '22 at 08:14
Thanks for the suggestion. I added those logs. The queue pod shows scheduling error "1 node(s) had volume node affinity conflict, 6 Insufficient cpu." Using answer from https://stackoverflow.com/a/55514852/2023941, I see I have `VolumeHandle`s in two different zones. — intotecho, Dec 05 '22 at 04:02
What you're trying to do very well might not be supported by autopilot (they want you to use the built-in GKE ingress vs bring your own ingress). (GKE docs used to have an excellent doc page about limitations of autopilot, it was REALLY LONG list of limitations, one of the main things I recall is validating / mutating webhooks are a no-go. Around the time they started pushing autopilot as the default option, I noticed that overly truthful/super accurate page that was super useful for engineers got removed.) — neoakris, Apr 17 '23 at 17:58

score 0 · Answer 1 · answered May 24 '23 at 13:02

After some time I resolved these scheduling issues with the following strategies.

If you are seeing:

Cannot schedule pods: Insufficient cpu.

This means you need to set the CPU Requests for the Pod to match autopilot.

If you can't find a CPU setting that works for your deployment, consider changing the pods compute class to Balanced.

If you are seeing:

volume node affinity conflict,

Remember that autopilot clusters are regional (not zonal) and most storage types are either zonal or, if redundant, operate in just two zones. Your region may have more than two zones and a pod in each zone needs storage. To solve this I setup an NFS (Google Filestore) which is costly. An alternative would be to configure your deployment to only schedule pods in the zones where the regional storage is located - with minor loss of redundancy and reduced cost.

GKE Autopilot Guardian is incompatible with cpu resource requests in chart

1 Answers1