2

hello we are using velero in DR planning , we are working on strategy of cross region backup restore , We are taking backups of workloads, PV and PVC's We are facing issues while restoring the backup to second region (US-West-2) from (US-EAST-2).

The installation goes through without any issues on both clusters using below command

velero install \
    --provider aws \
    --plugins velero/velero-plugin-for-aws:v1.4.0 \
    --bucket velerobucket\
    --backup-location-config region=us-east-2 \
    --snapshot-location-config region=us-east-2 \
    --secret-file secret-file

the backup creation also goes through without any error

velero backup create zookeeperbkp --include-namespaces zookeeper --snapshot-volumes

While doing the restore on us-west-2 cluster from us-east-2, the restore completes successfully without any errors in velero restore logs but the zookeeper pods go in pending state

velero restore create  --from-backup zookeeperbkp

kubectl get pods -n zookeeper
NAME          READY   STATUS    RESTARTS   AGE
zookeeper-0   0/2     Pending   0          3m24s
zookeeper-1   0/2     Pending   0          3m24s
zookeeper-2   0/2     Pending   0          3m24s

after describing the pods it complains with

0/1 nodes are available: 1 node(s) had volume node affinity conflict.

after describing the PV, it seems it is trying to create PV in us-east-2, the labels are of us-east-2 whereas it should be us-west-2(Restore cluster)

After all this i read more on limitations of velero on restoring PV's and PVC's in cross region cluster. There are links where people have modified the velero jsons in S3 https://github.com/vmware-tanzu/velero/issues/1624

I tried doing the same, by modifying the velero snapshot json file from s3

aws s3 cp s3://velerobkpxyz/backups/zookeeper/ ./ --recursive
gunzip zookeeper-volumesnapshots.json.gz
sed -i "s/us-east-2/us-west-2/g" zookeeper-volumesnapshots.json
s3 cp zookeeper-volumesnapshots.json.gz s3://velerobkp/backups/zookeeper/zookeeper-volumesnapshots.json.gz

similarly i did the change for zookeeper.tar.gz

mkdir zookeeper-temp
tar xzf zookeeper.tar.gz -C zookeeper-temp/
cd zookeeper-temp/
find . -name \*.json -exec sh -c "sed -i 's/us-east-2/us-west-2/g' {}" \;
tar czf ../zookeeper.tar.gz *
aws s3 cp zookeeper.tar.gz s3://velerobkp/backups/zookeeper/

After this the backup when described comes up with correct region names for PV's

velero backup describe zookeeper --details

    Name:         zookeeper9
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  velero.io/source-cluster-k8s-gitversion=v1.21.5-eks-bc4871b
              velero.io/source-cluster-k8s-major-version=1
              velero.io/source-cluster-k8s-minor-version=21+

Phase:  Completed

Errors:    0
Warnings:  0

Namespaces:
  Included:  zookeeper
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  <none>

Storage Location:  default

Velero-Native Snapshot PVs:  true

TTL:  720h0m0s

Hooks:  <none>

Backup Format Version:  1.1.0

Started:    2022-03-30 20:37:53 +0530 IST
Completed:  2022-03-30 20:37:57 +0530 IST

Expiration:  2022-04-29 20:37:53 +0530 IST

Total items to be backed up:  52
Items backed up:              52

Resource List:
  apiextensions.k8s.io/v1/CustomResourceDefinition:
    - servicemonitors.monitoring.coreos.com
  apps/v1/ControllerRevision:
    - zookeeper/zookeeper-596cddb599
    - zookeeper/zookeeper-5977bdccb6
    - zookeeper/zookeeper-5cd569cbf9
    - zookeeper/zookeeper-6585c9bc89
    - zookeeper/zookeeper-6bf55cfd99
    - zookeeper/zookeeper-856646d9f6
    - zookeeper/zookeeper-8cdd5f46
    - zookeeper/zookeeper-ccf87988c
  apps/v1/StatefulSet:
    - zookeeper/zookeeper
  discovery.k8s.io/v1/EndpointSlice:
    - zookeeper/zookeeper-headless-2tnx5
    - zookeeper/zookeeper-mzdlc
  monitoring.coreos.com/v1/ServiceMonitor:
    - zookeeper/zookeeper-exporter
  policy/v1/PodDisruptionBudget:
    - zookeeper/zookeeper
  v1/ConfigMap:
    - zookeeper/kube-root-ca.crt
    - zookeeper/zookeeper
  v1/Endpoints:
    - zookeeper/zookeeper
    - zookeeper/zookeeper-headless
  v1/Namespace:
    - zookeeper
  v1/PersistentVolume:
    - pvc-261b9803-8e55-4880-bb31-b29ca3a6c323
    - pvc-89cfd5b9-65da-4fd1-a095-83d21d1d21db
    - pvc-9e027e4c-cc9e-11ea-9ce3-061b42a2865e
    - pvc-a835d78d-9dfd-41f7-92bd-7f2e752dbeb7
    - pvc-c0e454f7-cc9e-11ea-9ce3-061b42a2865e
    - pvc-ee6aad46-cc9e-11ea-9ce3-061b42a2865e
  v1/PersistentVolumeClaim:
    - zookeeper/data-zookeeper-0
    - zookeeper/data-zookeeper-1
    - zookeeper/data-zookeeper-2
    - zookeeper/data-zookeeper-3
    - zookeeper/data-zookeeper-4
    - zookeeper/data-zookeeper-5
  v1/Pod:
    - zookeeper/zookeeper-0
    - zookeeper/zookeeper-1
    - zookeeper/zookeeper-2
    - zookeeper/zookeeper-3
    - zookeeper/zookeeper-4
    - zookeeper/zookeeper-5
  v1/Secret:
    - zookeeper/default-token-kcl4m
    - zookeeper/sh.helm.release.v1.zookeeper.v1
    - zookeeper/sh.helm.release.v1.zookeeper.v10
    - zookeeper/sh.helm.release.v1.zookeeper.v11
    - zookeeper/sh.helm.release.v1.zookeeper.v12
    - zookeeper/sh.helm.release.v1.zookeeper.v13
    - zookeeper/sh.helm.release.v1.zookeeper.v4
    - zookeeper/sh.helm.release.v1.zookeeper.v5
    - zookeeper/sh.helm.release.v1.zookeeper.v6
    - zookeeper/sh.helm.release.v1.zookeeper.v7
    - zookeeper/sh.helm.release.v1.zookeeper.v8
    - zookeeper/sh.helm.release.v1.zookeeper.v9
  v1/Service:
    - zookeeper/zookeeper
    - zookeeper/zookeeper-headless
  v1/ServiceAccount:
    - zookeeper/default

Velero-Native Snapshots:
  pvc-9e027e4c-cc9e-11ea-9ce3-061b42a2865e:
    Snapshot ID:        snap-0f81f2f62e476584a
    Type:               gp2
    Availability Zone:  us-west-2b
    IOPS:               <N/A>
  pvc-c0e454f7-cc9e-11ea-9ce3-061b42a2865e:
    Snapshot ID:        snap-0c689771f3dbfa361
    Type:               gp2
    Availability Zone:  us-west-2a
    IOPS:               <N/A>
  pvc-ee6aad46-cc9e-11ea-9ce3-061b42a2865e:
    Snapshot ID:        snap-068c63f1bb31af3cc
    Type:               gp2
    Availability Zone:  us-west-2b
    IOPS:               <N/A>
  pvc-89cfd5b9-65da-4fd1-a095-83d21d1d21db:
    Snapshot ID:        snap-050e2e51eac92bd74
    Type:               gp2
    Availability Zone:  us-west-2a
    IOPS:               <N/A>
  pvc-261b9803-8e55-4880-bb31-b29ca3a6c323:
    Snapshot ID:        snap-08e45396c99e7aac3
    Type:               gp2
    Availability Zone:  us-west-2b
    IOPS:               <N/A>
  pvc-a835d78d-9dfd-41f7-92bd-7f2e752dbeb7:
    Snapshot ID:        snap-07ad93657b0bdc1a6
    Type:               gp2
    Availability Zone:  us-west-2a
    IOPS:               <N/A> 

But when trying to restore it fails

velero restore create --from-backup zookeeper

velero restore describe zookeeper9-20220331145320
Name:         zookeeper9-20220331145320
Namespace:    velero
Labels:       <none>
Annotations:  <none>

Phase:                       PartiallyFailed (run 'velero restore logs zookeeper9-20220331145320' for more information)
Total items to be restored:  52
Items restored:              52

Started:    2022-03-31 14:53:24 +0530 IST
Completed:  2022-03-31 14:53:36 +0530 IST

Warnings:
  Velero:     <none>
  Cluster:    <none>
  Namespaces:
    zookeeper:  could not restore, ConfigMap "kube-root-ca.crt" already exists. Warning: the in-cluster version is different than the backed-up version.

Errors:
  Velero:     <none>
  Cluster:  error executing PVAction for persistentvolumes/pvc-261b9803-8e55-4880-bb31-b29ca3a6c323: rpc error: code = Unknown desc = InvalidZone.NotFound: The zone 'us-west-2b' does not exist.
  status code: 400, request id: 2b5ae55c-dfd5-4c52-8494-105e46bce78b
    error executing PVAction for persistentvolumes/pvc-89cfd5b9-65da-4fd1-a095-83d21d1d21db: rpc error: code = Unknown desc = InvalidZone.NotFound: The zone 'us-west-2a' does not exist.
  status code: 400, request id: ed91b698-d3b9-450f-b7b4-a3869cbae6ae
    error executing PVAction for persistentvolumes/pvc-9e027e4c-cc9e-11ea-9ce3-061b42a2865e: rpc error: code = Unknown desc = InvalidZone.NotFound: The zone 'us-west-2b' does not exist.
  status code: 400, request id: 2b493106-84c6-4210-9663-4d00f47c06de
    error executing PVAction for persistentvolumes/pvc-a835d78d-9dfd-41f7-92bd-7f2e752dbeb7: rpc error: code = Unknown desc = InvalidZone.NotFound: The zone 'us-west-2a' does not exist.
  status code: 400, request id: 387c6c27-6b18-4bc6-9bb8-3ed152cb49d1
    error executing PVAction for persistentvolumes/pvc-c0e454f7-cc9e-11ea-9ce3-061b42a2865e: rpc error: code = Unknown desc = InvalidZone.NotFound: The zone 'us-west-2a' does not exist.
  status code: 400, request id: 7d7d2931-e7d9-4bc5-8cb1-20e3b2849fe2
    error executing PVAction for persistentvolumes/pvc-ee6aad46-cc9e-11ea-9ce3-061b42a2865e: rpc error: code = Unknown desc = InvalidZone.NotFound: The zone 'us-west-2b' does not exist.
  status code: 400, request id: 75648031-97ca-4e2a-a079-8f6618902b2a
  Namespaces: <none>

Backup:  zookeeper9

Namespaces:
  Included:  all namespaces found in the backup
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io
  Cluster-scoped:  auto

Namespace mappings:  <none>

Label selector:  <none>

Restore PVs:  auto

Preserve Service NodePorts:  auto

it complains with

 Cluster:  error executing PVAction for persistentvolumes/pvc-261b9803-8e55-4880-bb31-b29ca3a6c323: rpc error: code = Unknown desc = InvalidZone.NotFound: The zone 'us-west-2b' does not exist.

status code: 400, request id: 2b5ae55c-dfd5-4c52-8494-105e46bce78b

Not sure why this is happening , is there anything that i have missed .

This makes me think is there some action also required on snapshots, because the backup has snapshot ids which are in source region and not available to destination region

Shane Warne
  • 1,350
  • 11
  • 24

1 Answers1

0

Since there is no support for this currently in velero, had to solve this through a workaround .

Thanks to a PR submitted by jglick that was raised for this feature in velero repo and velero plugin for aws

After Sourcing images from those 2 repo, i was able to copy PV and PVC's to a differnt region altogether.

As stated above this is not a hardcore solution and since it is not merged , i will suggest to refrain from using this directly in Prod and apply thorough testing. This is also suggested by the contributor of this PR.

Please go through this issue for the discussion and steps https://github.com/vmware-tanzu/velero-plugin-for-aws/pull/90

Step 1 : Source the images from these 2 PR repos https://github.com/jglick/velero/tree/concurrent-snapshot

https://github.com/jglick/velero-plugin-for-aws/tree/x-region

Steps for AWS ECR, you can replace it with corresponfing repository of your choice

Steps for AWS ECR

1. Create repository for velero and velero-plugin-for-aws

ex: aws ecr create-repository --repository-name testing/velero --region $region || echo already exists

Create repository for velero-plugin-for-aws

ex: aws ecr create-repository --repository-name testing/velero-plugin-for-aws --region $region || echo already exists

**2. create container for velero **

command

make -C /path/to/velero REGISTRY=$registry/testing VERSION=testing container

ex: make -C . REGISTRY=123456789.dkr.ecr.us-west-2.amazonaws.com/testing version=0.1 container

3. Create container for velero-plugin-for-aws

command

docker build -t $registry/testing/velero-plugin-for-aws /path/to/patched/velero-plugin-for-aws

ex: docker build -t 123456789.dkr.ecr.us-west-2.amazonaws.com/testing/velero-plugin-for-aws velero-plugin-for-aws

**4. Now login to AWS ECR in the region you wish to push the images **

command

aws ecr get-login-password --region $region | docker login --username AWS --password-stdin $registry

ex: aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin 123456789.dkr.ecr.us-west-2.amazonaws.com

5. Push the velero and velero-plugin-for-aws images to repository

command

docker push $registry/testing/velero

docker push $registry/testing/velero-plugin-for-aws

ex: docker push 123456789.dkr.ecr.us-west-2.amazonaws.com/testing/velero:main docker push 123456789.dkr.ecr.us-west-2.amazonaws.com/testing/velero-plugin-for-aws

Now your images are pushed to repository and can be used to create backups and restore in any region you wish to

Now installing velero in region where you want to take backup and another region where you want restore

create a values file with your current region and alt_region , so when the backups are happening in current region, for statefulsets having PV's, the volumes will be getting copied to alternate region that you specify.

Here is an example where we have setup us-east-2 as source region and us-west-2 as alternate region

ex:

cat /tmp/velero-us-east-2.yaml
image:
  repository: 123456789.dkr.ecr.us-west-2.amazonaws.com/testing/velero
  tag: main
initContainers:
- name: velero-plugin-for-aws
  image: 123456789.dkr.ecr.us-west-2.amazonaws.com/testing/velero-plugin-for-aws:latest
  volumeMounts:
  - mountPath: /target
    name: plugins
configuration:
  provider: aws
  backupStorageLocation:
    bucket: <your-bucket-name>
    config:
      region: us-east-2
  volumeSnapshotLocation:
    config:
      region: us-east-2
      altRegion: us-west-2
  extraEnvVars:
    AWS_CLUSTER_NAME: <your-EKS-Cluster-name>
    VELERO_AWS_AZ_OVERRIDE: us-east-2a
serviceAccount:
  server:
    create: true
    name: velero
credentials:
  useSecret: true
  secretContents:
    cloud: |
      [default]
      aws_access_key_id=<velerro-user-creds>
      aws_secret_access_key=<velero-user-creds>

So in this case the snapshots will be copied to us-west-2 region when backups are taken in us-east-2

Install velero in us-east-2 source region using helm

helm install velero vmware-tanzu/velero --version 2.24.0 --namespace velero --create-namespace -f /tmp/velero.yaml

Similarly this needs to be configured in region where we need to restore backup in our case us-west-2

Ex:

cat /tmp/velero-restore-us-west-2.yaml

image:
  repository: 123456789.dkr.ecr.us-west-2.amazonaws.com/testing/velero
  tag: main
initContainers:
- name: velero-plugin-for-aws
  image: 123456789.dkr.ecr.us-west-2.amazonaws.com/testing/velero-plugin-for-aws:latest
  volumeMounts:
  - mountPath: /target
    name: plugins
configuration:
  provider: aws
  backupStorageLocation:
    bucket: velerobkptest
    config:
      region: us-east-2
  volumeSnapshotLocation:
    config:
      region: us-west-2
      altRegion: us-west-2
  extraEnvVars:
    AWS_CLUSTER_NAME: <your-cluster-name in current region>
    VELERO_AWS_AZ_OVERRIDE: us-west-2a
serviceAccount:
  server:
    create: true
    name: velero
credentials:
  useSecret: true
  secretContents:
    cloud: |
      [default]
      aws_access_key_id=<velero-user-creds>
      aws_secret_access_key=<velero-user-creds>

Do a helm install helm install velero vmware-tanzu/velero --version 2.24.0 --namespace velero --create-namespace -f /tmp/velero-restore-us-west-2.yaml

Now check if the velero backup location is properly configured

velero get backup-location
NAME      PROVIDER   BUCKET/PREFIX   PHASE       LAST VALIDATED                  ACCESS MODE   DEFAULT
default   aws        velerobkptest   Available   2023-02-20 00:21:44 +0530 IST   ReadWrite     true

**Once both the clusters are setup we can run our backup and restore commands on us-east-2 and us-west-2 accordingly **

velero backup create zookeeper-z --include-namespaces zookeeper

** check the status using**

velero describe backup zookeeper-z --details

Restoring to us-west-2 from us-east-2 region

velero restore create --from-backup zookeeper-z

**The restore should be successful and pods should be running attached to its desired volumes **

kubectl get pods -n zookeeper
NAME          READY   STATUS    RESTARTS   AGE
zookeeper-0   2/2     Running   0          17h
zookeeper-1   2/2     Running   1          17h
zookeeper-2   2/2     Running   1          17h
zookeeper-3   2/2     Running   1          17h
zookeeper-4   2/2     Running   0          17h
zookeeper-5   2/2     Running   0          17h

We have assumed you have run all the steps to create iam user for velero and S3 buckets

README for the velero IAM user and S3 buckets configuration

Shane Warne
  • 1,350
  • 11
  • 24
  • Hi @Shane Warne, when restoring the backup in a different region, in bsl should we mention the backup s3 bucket (where all backup files are uploaded)? Also, should the bsl be in an available state able to restore the region? – TechGirl Apr 20 '23 at 15:38
  • yes you should mention the bucket where your backups are happening, please go through the steps aboce , where i have configured using helm in the region where we need to restore. – Shane Warne Apr 20 '23 at 16:58