-3

I created a 1-master 2-workers kubernetes cluster using kubeadm 1.20 and backed up the etcd. I destroyed the master on purpose to see test how to get cluster back to running state.

Kubernetes version: 1.20
Installation method: kubeadm
Host OS: windows 10 pro
Guest OS: ubuntu 18 on virtual box 6
CNI and version: weave-net
CRI and version: docker 19

I'm partially successful in that the secret that I created before destroying master is visible after etcd restore, so that part seems to work.

HOWEVER the coredns pods are unauthorized to make requests to api server, based on the logs of coredns pods:

[INFO] plugin/ready: Still waiting on: "kubernetes"
E1229 21:42:25.892580       1 reflector.go:178] pkg/mod/k8s.io/client-go@v0.18.3/tools/cache/reflector.go:125: Failed to list *v1.Namespace: Unauthorized
E1229 21:42:29.680620       1 reflector.go:178] pkg/mod/k8s.io/client-go@v0.18.3/tools/cache/reflector.go:125: Failed to list *v1.Endpoints: Unauthorized
[INFO] plugin/ready: Still waiting on: "kubernetes"
E1229 21:42:39.492521       1 reflector.go:178] pkg/mod/k8s.io/client-go@v0.18.3/tools/cache/reflector.go:125: Failed to list *v1.Service: Unauthorized

I'm guessing it has something to do with service account tokens so there's a step I'm missing to authorize pods to authenticate to api-server after etcd db replacement.

What am I missing?

Oliver
  • 27,510
  • 9
  • 72
  • 103
  • 3
    How is the question related to programming? It might be better suited for [serverfault](https://serverfault.com/). – Turing85 Dec 29 '20 at 22:01
  • @Turing85 I agree it is a gray area but consider this: 1) in kubernetes most tasks use "declarative" programming; 2) if you look at the questions that have similar tags to mine, my question seems to fit; 3) stackoverflow's kubernetes tag has about *30* times the number of watchers and questions as that same tag in serverfault. – Oliver Dec 29 '20 at 22:09
  • 1
    I believe that SO has more watchers. I am also not sure whether this question is on-topic for serverfault (hence the *might*). I am concerned about the answer quality. Developers normally do not have in-depth knowledge of operations-tasks (worst-case scenario: dangerous half-knowledge), especially not if they are as complex as disaster recovery. – Turing85 Dec 29 '20 at 22:11
  • What's the difference from, say, https://stackoverflow.com/questions/51370870/how-to-backup-etcd-on-a-kubernetes-cluster-created-with-kubeadm-rpc-error-cod? or https://stackoverflow.com/questions/34486213/kubernetes-autoscaling-not-recognizing-heapster? or https://stackoverflow.com/questions/51631714/kube-proxy-or-elb-delaying-packets-of-http-requests? – Oliver Dec 30 '20 at 19:28

1 Answers1

2

If you only backed up the contents of Etcd then kubeadm would have generated new certificates used for signing the ServiceAccount JWTs. Old tokens would no longer verify. As this is not generally done during routine maintenance, I don't think the SA controller knows to reissues the tokens. If you delete all the underlying secrets it should do the reissue though.

coderanger
  • 52,400
  • 4
  • 52
  • 75
  • That worked, thanks! Note I had to restart the pods, as the mounted secret does not disappear even after deleting it from the namespace. Good news is that only pods that really need access to k8s api are affected. I also found out from https://elastisys.com/backup-kubernetes-how-and-why/ that you're supposed to save the certs and use them at restore, which will prevent this problem from occurring in the first place. – Oliver Dec 30 '20 at 19:48