2

I'm new here so if this comes out stupid please forgive me I've been using Couchbase over 10+ years on real hardware. I've been working on establishing CB in Kubernetes and that seems to be working just fine. I'm also using Couchbase Autonomous Operator. Works great, no complaints with normal functioning thus far.

However, I've been working through performing Velero Backup and Restore of both the Cluster and the CB Operator. I thought I finally had it working earlier last week, but a recent attempt to restore from a Velero backup once again resulted in messages like this in the CBO's logs:

{"level":"info","ts":1680529171.8283288,"logger":"cluster","msg":"Reconcile completed","cluster":"default/cb-dev"}                                       
{"level":"info","ts":1680529172.0289326,"logger":"cluster","msg":"Pod ignored, no owner","cluster":"default/cb-dev","name":"cb-dev-0002"}
{"level":"info","ts":1680529172.0289645,"logger":"cluster","msg":"Pod ignored, no owner","cluster":"default/cb-dev","name":"cb-dev-0003"}
{"level":"info","ts":1680529172.0289707,"logger":"cluster","msg":"Pod ignored, no owner","cluster":"default/cb-dev","name":"cb-dev-0001"} 
{"level":"info","ts":1680529172.0289757,"logger":"cluster","msg":"Pod ignored, no owner","cluster":"default/cb-dev","name":"cb-dev-0004"}

I've tried to find what this really means. And I have some suspicions but I don't know how to resolve it.

Of note in the msgs above is that 'cb-dev-0000' never appears in the recurring list of msgs. These messages appear every few seconds in the couchbase-operator pod logs.

Additionally, if I delete one pod at a time, they will be recreated by K8s or CBO (not real sure) and then it disappears from the list that keeps repeating. Once I do that with all of them, this issue stops.

Any ideas, questions, comments on this would really be greatly appreciated

This is all just for testing at this point, nothing here is for production, I'm just trying to validate that Velero can indeed backup both Couchbase Operator and Couchbase Cluster and subsequently restore them from the below Schedule Backup.

I am using the default couchbase operator install using 2.4.0

I am using a very basic, functional couchase server cluster installation yaml

I tried to use this Schedule Velero Backup, and then restore from this backup, and I'm expecting that both the Couchbase Cluster, and Couchbase Operator will restore without any issues.

But what happens is that I get a functional CB Cluster, and a CBO which logs constantly msgs like this:

{"level":"info","ts":1680529171.8283288,"logger":"cluster","msg":"Reconcile completed","cluster":"default/cb-dev"}                                       
{"level":"info","ts":1680529172.0289326,"logger":"cluster","msg":"Pod ignored, no owner","cluster":"default/cb-dev","name":"cb-dev-0002"}
}

This might be important I dunno I never see 'cb-dev-0000' listed in these msgs tho the pod does exist. I reiterate the restore CB Cluster is functioning 'normally' near as I can tell, and CB operator is the only thing reporting these types of errors.

kubectl apply -f schedule.yaml

Where schedule.yaml contains this:

apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: dev-everything-schedule
  namespace: velero
spec:
  schedule: 0 * * * *
  template:
    metadata:
      labels:
        velero.io/schedule-name: dev-everything-schedule
    storageLocation: default
    includeClusterResources: true
    includedNamespaces:
    - kube-public
    - kube-system
    - istio-system
    - velero
    - default
    - cert-manager
    - kube-node-lease
    excludedResources:
    includedResources:
    - authorizationpolicies.security.istio.io
    - backuprepositories.velero.io
    - backupstoragelocations.velero.io
    - backups.velero.io
    - certificaterequests.cert-manager.io
    - certificates.cert-manager.io
    - cert-manager-webhook
    - challenges.acme.cert-manager.io
    - clusterissuers.cert-manager.io
    - clusterrolebindings.rbac.authorization.k8s.io
    - clusterroles.rbac.authorization.k8s.io
    - configmaps
    - controllerrevisions
    - couchbaseautoscalers.couchbase.com
    - couchbasebackuprestores.couchbase.com
    - couchbasebackups.couchbase.com
    - couchbasebuckets.couchbase.com
    - couchbaseclusteroauths
    - couchbaseclusters.couchbase.com
    - couchbasecollectiongroups.couchbase.com
    - couchbasecollections.couchbase.com
    - couchbaseephemeralbuckets.couchbase.com
    - couchbaseevents
    - couchbasegroups.couchbase.com
    - couchbasememcachedbuckets.couchbase.com
    - couchbasemigrationreplications.couchbase.com
    - couchbasereplications.couchbase.com
    - couchbaserolebindings.couchbase.com
    - couchbasescopegroups.couchbase.com
    - couchbasescopes.couchbase.com
    - couchbaseusers.couchbase.com
    - cronjobs
    - csidrivers
    - csistoragecapacities
    - customresourcedefinitions.apiextensions.k8s.io
    - daemonsets
    - deletebackuprequests
    - deletebackuprequests.velero.io
    - deployments
    - destinationrules.networking.istio.io
    - downloadrequests.velero.io
    - endpoints
    - endpointslices
    - eniconfigs.crd.k8s.amazonaws.com
    - envoyfilters.networking.istio.io
    - events
    - gateways
    - gateways.networking.istio.io
    - horizontalpodautoscalers
    - ingressclassparams.elbv2.k8s.aws
    - ingresses
    - issuers.cert-manager.io
    - istiooperators.install.istio.io
    - item_istiooperators
    - item_wasmplugins
    - jobs
    - leases
    - limitranges
    - namespaces
    - networkpolicies
    - orders.acme.cert-manager.io
    - peerauthentications.security.istio.io
    - persistentvolumeclaims
    - persistentvolumes
    - poddisruptionbudgets
    - pods
    - podtemplates
    - podvolumebackups.velero.io
    - podvolumerestores.velero.io
    - priorityclasses.scheduling.k8s.io
    - proxyconfigs.networking.istio.io
    - replicasets
    - replicationcontrollers
    - requestauthentications.security.istio.io
    - resourcequotas
    - restores.velero.io
    - rolebindings.rbac.authorization.k8s.io
    - roles.rbac.authorization.k8s.io
    - schedules.velero.io
    - secrets
    - securitygrouppolicies.vpcresources.k8s.aws
    - serverstatusrequests.velero.io
    - serviceaccounts
    - serviceentries
    - serviceentries.networking.istio.io
    - services
    - sidecars.networking.istio.io
    - statefulsets
    - targetgroupbindings.elbv2.k8s.aws
    - telemetries.telemetry.istio.io
    - telemetry
    - validatingwebhookconfiguration.admissionregistration.k8s.io
    - virtualservices.networking.istio.io
    - volumesnapshotlocations.velero.io
    - wasmplugins.extensions.istio.io
    - workloadentries.networking.istio.io
    - workloadgroups.networking.istio.io
    ttl: 12h

I kubectl delete the cluster, and operator and subsequently restore them from the Velero backup using something like this:

velero restore create dev-everything-schedule-20230331160030 --from-backup dev-everything-schedule-20230331160030

It restores the cluster, and cbo and that's when I start seeing the logs in the couchbase-operator pods logs.

UPDATE:

Digging into the JSON files of the Velero Backup under pods/namespaces/default/cb-dev-0000.json and comparing that with cb-dev-0001.json I just spotted a major difference that probably relates to this issue:

{
    "apiVersion": "v1",
    "kind": "Pod",
    "metadata": {
        ...
        "name": "cb-dev-0000",
        "namespace": "default",
        "ownerReferences": [
            {
                "apiVersion": "couchbase.com/v2",
                "blockOwnerDeletion": true,
                "controller": true,
                "kind": "CouchbaseCluster",
                "name": "cb-dev",
                "uid": "xxxxxxx-xxxx-xxxx-xxxx-xxxxxx"
            }
        ],
        "resourceVersion": "xxxxxxx",
        "uid": "xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx"
    }
    ...
}

and now the same thing for cb-dev-0001 (one of the ones getting logged constantly in CBO)

{
    "apiVersion": "v1",
    "kind": "Pod",
    "metadata": {
        ...
        "name": "cb-dev-0001",
        "namespace": "default",
        "resourceVersion": "xxxxxxx",
        "uid": "xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx"
    }
    ...
}

ownerReferences is missing from the Velero backup for cb-dev-0001, 0002, 0003, 0004. Now I think I'm onto something.

I don't know why Velero would find this and store it in the backup for ONE POD vs all of them. But that's a clue I think...

Still hunting...

UPDATE 2:

I've confirmed that Velero is storing the backup for couchbase objects in it's JSON files correctly every time (from what I've seen so far).

However, the velero restore is almost randomly not setting the metadata.ownerReferences in the restored Couchbase pods. Sometimes it's only in the Couchbase Services and the CB-dev-0000 pod. Sometimes it's not in any of them. Sometimes I've seen in (in the past) set in all of them (correctly?).

SO it's still a mystery, but that's where I am so far. I've seen other people mentioning on various chat/forums that they've experienced similar issues with Velero.

I'm secretly hoping I'll find a missing argument or annotation were I can specifically force ownerReferences to be restored for certain objects. But I haven't seen that yet...

James Z
  • 12,209
  • 10
  • 24
  • 44
Tim Havens
  • 41
  • 4

2 Answers2

2

As Sathya S. noted it appears that Velero doesn't (reliably) restore metadata.OwnerReferences from it's backups.

I will add to that, that SOMETIMES it does. And that's what throws me. It almost seems like it has a pattern when it does at least in my case. if CB-dev-0000 has it, then the Services will also. But then the remaining CB pods won't. Otherwise all of them 'might' have it set, or none of them. At least in the example I've setup here.

Couchbase notes in thier docs about NOT including 'pods' and 'services' in the Velero backup. This had stuck in my mind but I kinda didn't trust it.

Turns out THAT seems to be VITAL for Velero to properly restore my Couchbase cluster and avoid the "Pod ignored, no owner" issue seen in Couchbase Operator logs.

Once I removed 'pods' and 'services' from my scheduled backup and it created a backup, then I kubectl deleted my Couchbase cluster. Then I velero restore create --from-backup and wah-lah the cluster came up. Additionally I'll also note that the Indexes and Bucket documents I'd created also were restored.

Most importantly to this issue is that metadata.ownerReferences were all setup properly. I've done this several times now before Answering this issue. And this seems to be the important thing. Don't include pods, and services in the backup.

"You may have noticed that neither pods nor services were backed up. This is because the Operator will be able to recreate them from the cluster ConfigMap, metadata attached to the persistent volume claims, and the CouchbaseCluster resource itself. Likewise the deployment will be able to recreate the Operator pod." ~ https://docs.couchbase.com/operator/current/tutorial-velero-backup.html#creating-a-velero-backup

Ultimately all I had to do was remove pods and services from my schedule backups 'includedResources' yaml and delete/apply the schedule.

Tim Havens
  • 41
  • 4
1

Restore of OwnerReferences Not Supported by Velero.