I'm new here so if this comes out stupid please forgive me I've been using Couchbase over 10+ years on real hardware. I've been working on establishing CB in Kubernetes and that seems to be working just fine. I'm also using Couchbase Autonomous Operator. Works great, no complaints with normal functioning thus far.
However, I've been working through performing Velero Backup and Restore of both the Cluster and the CB Operator. I thought I finally had it working earlier last week, but a recent attempt to restore from a Velero backup once again resulted in messages like this in the CBO's logs:
{"level":"info","ts":1680529171.8283288,"logger":"cluster","msg":"Reconcile completed","cluster":"default/cb-dev"}
{"level":"info","ts":1680529172.0289326,"logger":"cluster","msg":"Pod ignored, no owner","cluster":"default/cb-dev","name":"cb-dev-0002"}
{"level":"info","ts":1680529172.0289645,"logger":"cluster","msg":"Pod ignored, no owner","cluster":"default/cb-dev","name":"cb-dev-0003"}
{"level":"info","ts":1680529172.0289707,"logger":"cluster","msg":"Pod ignored, no owner","cluster":"default/cb-dev","name":"cb-dev-0001"}
{"level":"info","ts":1680529172.0289757,"logger":"cluster","msg":"Pod ignored, no owner","cluster":"default/cb-dev","name":"cb-dev-0004"}
I've tried to find what this really means. And I have some suspicions but I don't know how to resolve it.
Of note in the msgs above is that 'cb-dev-0000' never appears in the recurring list of msgs. These messages appear every few seconds in the couchbase-operator pod logs.
Additionally, if I delete one pod at a time, they will be recreated by K8s or CBO (not real sure) and then it disappears from the list that keeps repeating. Once I do that with all of them, this issue stops.
Any ideas, questions, comments on this would really be greatly appreciated
This is all just for testing at this point, nothing here is for production, I'm just trying to validate that Velero can indeed backup both Couchbase Operator and Couchbase Cluster and subsequently restore them from the below Schedule Backup.
I am using the default couchbase operator install using 2.4.0
I am using a very basic, functional couchase server cluster installation yaml
I tried to use this Schedule Velero Backup, and then restore from this backup, and I'm expecting that both the Couchbase Cluster, and Couchbase Operator will restore without any issues.
But what happens is that I get a functional CB Cluster, and a CBO which logs constantly msgs like this:
{"level":"info","ts":1680529171.8283288,"logger":"cluster","msg":"Reconcile completed","cluster":"default/cb-dev"}
{"level":"info","ts":1680529172.0289326,"logger":"cluster","msg":"Pod ignored, no owner","cluster":"default/cb-dev","name":"cb-dev-0002"}
}
This might be important I dunno I never see 'cb-dev-0000' listed in these msgs tho the pod does exist. I reiterate the restore CB Cluster is functioning 'normally' near as I can tell, and CB operator is the only thing reporting these types of errors.
kubectl apply -f schedule.yaml
Where schedule.yaml contains this:
apiVersion: velero.io/v1
kind: Schedule
metadata:
name: dev-everything-schedule
namespace: velero
spec:
schedule: 0 * * * *
template:
metadata:
labels:
velero.io/schedule-name: dev-everything-schedule
storageLocation: default
includeClusterResources: true
includedNamespaces:
- kube-public
- kube-system
- istio-system
- velero
- default
- cert-manager
- kube-node-lease
excludedResources:
includedResources:
- authorizationpolicies.security.istio.io
- backuprepositories.velero.io
- backupstoragelocations.velero.io
- backups.velero.io
- certificaterequests.cert-manager.io
- certificates.cert-manager.io
- cert-manager-webhook
- challenges.acme.cert-manager.io
- clusterissuers.cert-manager.io
- clusterrolebindings.rbac.authorization.k8s.io
- clusterroles.rbac.authorization.k8s.io
- configmaps
- controllerrevisions
- couchbaseautoscalers.couchbase.com
- couchbasebackuprestores.couchbase.com
- couchbasebackups.couchbase.com
- couchbasebuckets.couchbase.com
- couchbaseclusteroauths
- couchbaseclusters.couchbase.com
- couchbasecollectiongroups.couchbase.com
- couchbasecollections.couchbase.com
- couchbaseephemeralbuckets.couchbase.com
- couchbaseevents
- couchbasegroups.couchbase.com
- couchbasememcachedbuckets.couchbase.com
- couchbasemigrationreplications.couchbase.com
- couchbasereplications.couchbase.com
- couchbaserolebindings.couchbase.com
- couchbasescopegroups.couchbase.com
- couchbasescopes.couchbase.com
- couchbaseusers.couchbase.com
- cronjobs
- csidrivers
- csistoragecapacities
- customresourcedefinitions.apiextensions.k8s.io
- daemonsets
- deletebackuprequests
- deletebackuprequests.velero.io
- deployments
- destinationrules.networking.istio.io
- downloadrequests.velero.io
- endpoints
- endpointslices
- eniconfigs.crd.k8s.amazonaws.com
- envoyfilters.networking.istio.io
- events
- gateways
- gateways.networking.istio.io
- horizontalpodautoscalers
- ingressclassparams.elbv2.k8s.aws
- ingresses
- issuers.cert-manager.io
- istiooperators.install.istio.io
- item_istiooperators
- item_wasmplugins
- jobs
- leases
- limitranges
- namespaces
- networkpolicies
- orders.acme.cert-manager.io
- peerauthentications.security.istio.io
- persistentvolumeclaims
- persistentvolumes
- poddisruptionbudgets
- pods
- podtemplates
- podvolumebackups.velero.io
- podvolumerestores.velero.io
- priorityclasses.scheduling.k8s.io
- proxyconfigs.networking.istio.io
- replicasets
- replicationcontrollers
- requestauthentications.security.istio.io
- resourcequotas
- restores.velero.io
- rolebindings.rbac.authorization.k8s.io
- roles.rbac.authorization.k8s.io
- schedules.velero.io
- secrets
- securitygrouppolicies.vpcresources.k8s.aws
- serverstatusrequests.velero.io
- serviceaccounts
- serviceentries
- serviceentries.networking.istio.io
- services
- sidecars.networking.istio.io
- statefulsets
- targetgroupbindings.elbv2.k8s.aws
- telemetries.telemetry.istio.io
- telemetry
- validatingwebhookconfiguration.admissionregistration.k8s.io
- virtualservices.networking.istio.io
- volumesnapshotlocations.velero.io
- wasmplugins.extensions.istio.io
- workloadentries.networking.istio.io
- workloadgroups.networking.istio.io
ttl: 12h
I kubectl delete the cluster, and operator and subsequently restore them from the Velero backup using something like this:
velero restore create dev-everything-schedule-20230331160030 --from-backup dev-everything-schedule-20230331160030
It restores the cluster, and cbo and that's when I start seeing the logs in the couchbase-operator pods logs.
UPDATE:
Digging into the JSON files of the Velero Backup under pods/namespaces/default/cb-dev-0000.json and comparing that with cb-dev-0001.json I just spotted a major difference that probably relates to this issue:
{
"apiVersion": "v1",
"kind": "Pod",
"metadata": {
...
"name": "cb-dev-0000",
"namespace": "default",
"ownerReferences": [
{
"apiVersion": "couchbase.com/v2",
"blockOwnerDeletion": true,
"controller": true,
"kind": "CouchbaseCluster",
"name": "cb-dev",
"uid": "xxxxxxx-xxxx-xxxx-xxxx-xxxxxx"
}
],
"resourceVersion": "xxxxxxx",
"uid": "xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx"
}
...
}
and now the same thing for cb-dev-0001 (one of the ones getting logged constantly in CBO)
{
"apiVersion": "v1",
"kind": "Pod",
"metadata": {
...
"name": "cb-dev-0001",
"namespace": "default",
"resourceVersion": "xxxxxxx",
"uid": "xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx"
}
...
}
ownerReferences is missing from the Velero backup for cb-dev-0001, 0002, 0003, 0004. Now I think I'm onto something.
I don't know why Velero would find this and store it in the backup for ONE POD vs all of them. But that's a clue I think...
Still hunting...
UPDATE 2:
I've confirmed that Velero is storing the backup for couchbase objects in it's JSON files correctly every time (from what I've seen so far).
However, the velero restore is almost randomly not setting the metadata.ownerReferences in the restored Couchbase pods. Sometimes it's only in the Couchbase Services and the CB-dev-0000 pod. Sometimes it's not in any of them. Sometimes I've seen in (in the past) set in all of them (correctly?).
SO it's still a mystery, but that's where I am so far. I've seen other people mentioning on various chat/forums that they've experienced similar issues with Velero.
I'm secretly hoping I'll find a missing argument or annotation were I can specifically force ownerReferences to be restored for certain objects. But I haven't seen that yet...