3

I noticed that Velero can only backup AKS PVCs if those PVCs are disk and not Azure fileshares. To handle this i tried to use restic to backup by fileshares itself but i gives me a strange log:

This is how my actual pod looks like

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    backup.velero.io/backup-volumes: grafana-data
    deployment.kubernetes.io/revision: "17"

And the log of my backup:

time="2020-05-26T13:51:54Z" level=info msg="Adding pvc grafana-data to additionalItems" backup=velero/grafana-test-volume cmd=/velero logSource="pkg/backup/pod_action.go:67" pluginName=velero
time="2020-05-26T13:51:54Z" level=info msg="Backing up item" backup=velero/grafana-test-volume group=v1 logSource="pkg/backup/item_backupper.go:169" name=grafana-data namespace=grafana resource=persistentvolumeclaims
time="2020-05-26T13:51:54Z" level=info msg="Executing custom action" backup=velero/grafana-test-volume group=v1 logSource="pkg/backup/item_backupper.go:330" name=grafana-data namespace=grafana resource=persistentvolumeclaims
time="2020-05-26T13:51:54Z" level=info msg="Skipping item because it's already been backed up." backup=velero/grafana-test-volume group=v1 logSource="pkg/backup/item_backupper.go:163" name=grafana-data namespace=grafana resource=persistentvolumeclaims

As you can see somehow it did not backup the grafana-data volume since it says it is already in the backup (where it is actually not).

My azurefile volume holds these contents:

allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"allowVolumeExpansion":true,"apiVersion":"storage.k8s.io/v1beta1","kind":"StorageClass","metadata":{"annotations":{},"labels":{"kubernetes.io/cluster-service":"true"},"name":"azurefile"},"parameters":{"skuName":"Standard_LRS"},"provisioner":"kubernetes.io/azure-file"}
  creationTimestamp: "2020-05-18T15:18:18Z"
  labels:
    kubernetes.io/cluster-service: "true"
  name: azurefile
  resourceVersion: "1421202"
  selfLink: /apis/storage.k8s.io/v1/storageclasses/azurefile
  uid: e3cc4e52-c647-412a-bfad-81ab6eb222b1
mountOptions:
- nouser_xattr
parameters:
  skuName: Standard_LRS
provisioner: kubernetes.io/azure-file
reclaimPolicy: Delete
volumeBindingMode: Immediate

As you can see i actually patched the storage class to hold the nouser_xattr mount option which was suggested earlier

When i check the Restic pod logs i see the following info:

E0524 10:22:08.908190       1 reflector.go:156] github.com/vmware-tanzu/velero/pkg/generated/informers/externalversions/factory.go:117: Failed to list *v1.PodVolumeBackup: Get https://10.0.0.1:443/apis/velero.io/v1/namespaces/velero/podvolumebackups?limit=500&resourceVersion=1212830: dial tcp 10.0.0.1:443: i/o timeout
I0524 10:22:08.909577       1 trace.go:116] Trace[1946538740]: "Reflector ListAndWatch" name:github.com/vmware-tanzu/velero/pkg/generated/informers/externalversions/factory.go:117 (started: 2020-05-24 10:21:38.908988405 +0000 UTC m=+487217.942875118) (total time: 30.000554209s):
Trace[1946538740]: [30.000554209s] [30.000554209s] END

When i check the PodVolumeBackup pod i see below contents. I don't know what is expected here though

➜  ~ kubectl -n velero get podvolumebackups -o yaml              
apiVersion: v1
items: []
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

To summarize this, i installed Velero like this:

velero install \
  --provider azure \
  --plugins velero/velero-plugin-for-microsoft-azure:v1.0.1 \
  --bucket $BLOB_CONTAINER \
  --secret-file ./credentials-velero \
  --backup-location-config resourceGroup=$AZURE_BACKUP_RESOURCE_GROUP,storageAccount=$AZURE_STORAGE_ACCOUNT_ID \
  --snapshot-location-config apiTimeout=5m,resourceGroup=$AZURE_BACKUP_RESOURCE_GROUP \
  --use-restic
  --wait

The end result is the deployment described below

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    backup.velero.io/backup-volumes: app-upload
    deployment.kubernetes.io/revision: "18"
  creationTimestamp: "2020-05-18T16:55:38Z"
  generation: 10
  labels:
    app: app
    velero.io/backup-name: mekompas-tenant-production-20200518020012
    velero.io/restore-name: mekompas-tenant-production-20200518020012-20200518185536
  name: app
  namespace: mekompas-tenant-production
  resourceVersion: "427893"
  selfLink: /apis/extensions/v1beta1/namespaces/mekompas-tenant-production/deployments/app
  uid: c1961ec3-b7b1-4f81-9aae-b609fa3d31fc
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: app
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      annotations:
        kubectl.kubernetes.io/restartedAt: "2020-05-18T20:24:19+02:00"
      creationTimestamp: null
      labels:
        app: app
    spec:
      containers:
      - image: nginx:1.17-alpine
        imagePullPolicy: IfNotPresent
        name: app-nginx
        ports:
        - containerPort: 80
          name: http
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/www/html
          name: app-files
        - mountPath: /etc/nginx/conf.d
          name: nginx-vhost
      - env:
        - name: CONF_DB_HOST
          value: db.mekompas-tenant-production
        - name: CONF_DB
          value: mekompas
        - name: CONF_DB_USER
          value: mekompas
        - name: CONF_DB_PASS
          valueFrom:
            secretKeyRef:
              key: DATABASE_PASSWORD
              name: secret
        - name: CONF_EMAIL_FROM_ADDRESS
          value: noreply@mekompas.nl
        - name: CONF_EMAIL_FROM_NAME
          value: mekompas
        - name: CONF_EMAIL_REPLYTO_ADDRESS
          value: slc@mekompas.nl
        - name: CONF_UPLOAD_PATH
          value: /uploads
        - name: CONF_SMTP_HOST
          value: smtp.sendgrid.net
        - name: CONF_SMTP_PORT
          value: "587"
        - name: CONF_SMTP_USER
          value: apikey
        - name: CONF_SMTP_PASSWORD
          valueFrom:
            secretKeyRef:
              key: MAIL_PASSWORD
              name: secret
        image: me.azurecr.io/mekompas/php-fpm-alpine:1.12.0
        imagePullPolicy: Always
        lifecycle:
          postStart:
            exec:
              command:
              - /bin/sh
              - -c
              - cp -r /app/. /var/www/html && chmod -R 777 /var/www/html/templates_c
                && chmod -R 777 /var/www/html/core/lib/htmlpurifier-4.9.3/library/HTMLPurifier/DefinitionCache
        name: app-php
        ports:
        - containerPort: 9000
          name: upstream-php
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/www/html
          name: app-files
        - mountPath: /uploads
          name: app-upload
      dnsPolicy: ClusterFirst
      imagePullSecrets:
      - name: registrypullsecret
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
      - name: app-upload
        persistentVolumeClaim:
          claimName: upload
      - emptyDir: {}
        name: app-files
      - configMap:
          defaultMode: 420
          name: nginx-vhost
        name: nginx-vhost
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: "2020-05-18T18:12:20Z"
    lastUpdateTime: "2020-05-18T18:12:20Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2020-05-18T16:55:38Z"
    lastUpdateTime: "2020-05-20T16:03:48Z"
    message: ReplicaSet "app-688699c5fb" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  observedGeneration: 10
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1

Best, Pim

Dirkos
  • 488
  • 1
  • 10
  • 33

1 Answers1

2

Have you added nouser_xattr to your StorageClass mountOptions list?

This requirement is documented in GitHub issue 1800.

Also mentioned on the restic integration page (check under the Azure section), where they provide this snippet to patch your StorageClass resource:

kubectl patch storageclass/<YOUR_AZURE_FILE_STORAGE_CLASS_NAME> \
  --type json \
  --patch '[{"op":"add","path":"/mountOptions/-","value":"nouser_xattr"}]'

If you have no existing mountOptions list, you can try:

kubectl patch storageclass azurefile \
  --type merge \
  --patch '{"mountOptions": ["nouser_xattr"]}'

Ensure the pod template of the Deployment resource includes the annotation backup.velero.io/backup-volumes. Annotations on Deployment resources will propagate to ReplicaSet resources, but not to Pod resources.

Specifically, in your example the annotation backup.velero.io/backup-volumes: app-upload should be a child of spec.template.metadata.annotations, rather than a child of metadata.annotations.

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    # *** move velero annotiation from here ***
  labels:
    app: app
  name: app
  namespace: mekompas-tenant-production
spec:
  template:
    metadata:
      annotations:
        # *** velero annotation goes here in order to end up on the pod ***
        backup.velero.io/backup-volumes: app-upload
      labels:
        app: app
    spec:
      containers:
      - image: nginx:1.17-alpine
bpdohall
  • 1,046
  • 6
  • 9
  • This does not seem to work. See the main post for details – Dirkos May 21 '20 at 16:05
  • can you edit the storageclass to add the mountOption? I've included another patch command that should work. – bpdohall May 21 '20 at 19:48
  • Also this patch command does not seem to work, see main thread – Dirkos May 22 '20 at 06:41
  • It seems that azurefile is a default storageclass that you cant update so i need to make a custom one. However since there is already data in azurefile, what happens if i update the PVC to another storage class? Will the data remain or will it recreate? – Dirkos May 22 '20 at 12:37
  • I've updated the second patch command to use JSON in the patch instead of yaml, it should run without error now. – bpdohall May 22 '20 at 13:54
  • one last question, what will happen with the data that is attached to the storageclass when i patch it? – Dirkos May 23 '20 at 16:56
  • I would expect an in place update with no data loss. You could confirm by making a clone of the azurefile storageclass resource and test the behavior of a pod having a PVC of the new storageclass. – bpdohall May 23 '20 at 17:55
  • The update worked fine however there is still no result for the backup of the file share in this case. So the same as main thread mentions – Dirkos May 25 '20 at 10:10
  • @Dirkos - The creator of this ticket: https://github.com/vmware-tanzu/velero/issues/887 indicated they had success, but they had to use these mountOptions: https://learn.microsoft.com/en-us/azure/aks/azure-files-dynamic-pv#mount-options Could you post the config of the "PodVolumeBackup" resource that your restic backup is based on? – bpdohall May 25 '20 at 13:19
  • Also if you have logs from the pod where `restic backup` runs, that might add some details. – bpdohall May 25 '20 at 13:22
  • I added the restic pod logs and it indicates timeouts though however i have no clue if this is relevant since it says it is skipped since it is already in a backup itself. – Dirkos May 25 '20 at 13:52
  • @Dirkos - the restic logs you posted indicate the `podvolumebackups` endpoint is not responding. Do you have logs from the first run on 2020-05-20? If not, please update your question to include MRE (https://stackoverflow.com/help/minimal-reproducible-example). You're probably better to start from the beginning in a different namespace so you can post the contents of all the resources involved (deployments, pods, PVCs, etc). – bpdohall May 26 '20 at 12:22
  • i added some more info. I just created a new backup and the logs indicate the same however i notice that the podvolumebackup thing is empty. I have no idea what is expected here though – Dirkos May 26 '20 at 14:06
  • Can you review this document: https://velero.io/docs/v1.3.2/restic/, specifically the paragraph starting with "The main Velero backup process checks each pod". Based on your YAML, it looks like you might have annotated the deployment, but not the pod itself? – bpdohall May 26 '20 at 14:19
  • @Dirkos - Please include the output of `kubectl get deployment ` so we can review whether the pod template includes the required annotation. – bpdohall May 27 '20 at 15:47
  • Added the output of the deployment itself – Dirkos May 28 '20 at 10:54
  • 1
    @Dirkos - I've updated my answer to show where the annotation needs to go in order for `PodVolumeBackup` resources to be created by Velero. – bpdohall May 28 '20 at 13:47
  • I accepted this answer! Thanks for the support here – Dirkos Jun 03 '20 at 09:07