0

I am struggling with a volumeattach error. I have a regional persistent disk which is in the same GCP project as my regional GKE cluster. My regional cluster is in europe-west2 with nodes in europe-west2-a, b and c. the regional disk is replicated across zones europe-west2-b and c.

I have a nfs-server deployment manifest which refers to the gcePersistantDisk.

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations: []
  labels:
    app.kubernetes.io/managed-by: Helm
  name: nfs-server
  namespace: namespace
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  selector:
    matchLabels:
      role: nfs-server
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        role: nfs-server
    spec:
      serviceAccountName: nfs-server 
      containers:
      - image: gcr.io/google_containers/volume-nfs:0.8
        imagePullPolicy: IfNotPresent
        name: nfs-server
        ports:
        - containerPort: 2049
          name: nfs
          protocol: TCP
        - containerPort: 20048
          name: mountd
          protocol: TCP
        - containerPort: 111
          name: rpcbind
          protocol: TCP
        securityContext:
          privileged: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /data
          name: nfs-pvc
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
      - gcePersistentDisk:
          fsType: ext4
          pdName: my-regional-disk-name
        name: nfs-pvc
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution :
            nodeSelectorTerms: 
              - matchExpressions:
                - key: topology.gke.io/zone
                  operator: In
                  values:
                      - europe-west2-b
                      - europe-west2-c

and my pv/pvc

apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-pv
spec:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 200Gi
  nfs:
    path: /
    server: nfs-server.namespace.svc.cluster.local
  persistentVolumeReclaimPolicy: Retain
  volumeMode: Filesystem
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  labels:
    app.kubernetes.io/managed-by: Helm
  name: nfs-pvc
  namespace: namespace
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 8Gi
  storageClassName: ""
  volumeMode: Filesystem
  volumeName: nfs-pv

When I apply my deployment manifest above I get the following error:

'rpc error: code = Unavailable desc = ControllerPublish not permitted on node "projects/ap-mc-qa-xxx-xxxx/zones/europe-west2-a/instances/node-instance-id" due to backoff condition'

The volume attachment tells me this:

Attach Error: Message:  rpc error: code = NotFound desc = ControllerPublishVolume could not find volume with ID projects/UNSPECIFIED/zones/UNSPECIFIED/disks/my-regional-disk-name: googleapi: Error 0: , notFound

These manifests seemed to work fine when it was deployed for a zonal cluster/disk. I've checked things like making sure the cluster svc acct has the necessary permissions. Disk is currently not in use.

What am I missing???

Alan
  • 491
  • 4
  • 13

2 Answers2

0

I think we should focus on the type of Nodes that make up your Kubernetes cluster.

Regional persistent disks are restricted from being used with memory-optimized machines or compute-optimized machines.

Consider using a non-regional persistent disk storage class if using a regional persistent disk is not a hard requirement. If using a regional persistent disk is a hard requirement, consider scheduling strategies such as taints and tolerations to ensure that the Pods that need regional PD are scheduled on a node pool that are not optimized machines.

https://cloud.google.com/kubernetes-engine/docs/troubleshooting#error_400_cannot_attach_repd_to_an_optimized_vm

glv
  • 994
  • 1
  • 1
  • 15
  • the node pool machines are e2-standards which should be fine I believe. – Alan Apr 06 '23 at 15:13
  • mmm try to have a look to all these other restrictions: https://cloud.google.com/compute/docs/disks#restrictions_2 – glv Apr 06 '23 at 15:36
  • 1
    I think I possibly found the answer - https://kubernetes.io/docs/concepts/storage/volumes/#regional-persistent-disks The Regional persistent disks feature allows the creation of persistent disks that are available in two zones within the same region. In order to use this feature, the volume must be provisioned as a PersistentVolume; referencing the volume directly from a pod is not supported. – Alan Apr 06 '23 at 15:53
0

So the reason that the above won't work is because a regional persistant disk feature allows the creation of persistent disks that are available in 2 zones within the same region. In order to use that feature, the volume must be provisioned as a PersistentVolume; referencing the volume directly from a pod is not supported. Something like this:

apiVersion: v1
kind: PersistentVolume
metadata:
 name: nfs-pv
spec:
 capacity:
   storage: 200Gi
 accessModes:
 - ReadWriteMany
 gcePersistentDisk:
   pdName: my-regional-disk
   fsType: ext4

Now trying to figure out how to re-configure the NFS sever to use a regional disk.

Alan
  • 491
  • 4
  • 13