I have a Rancher (RKE2) cluster, where I want to restore the previous etcd snapshot. I followed the (official description) but it doesn't work for me. The process gets stuck in an infinite loop. On the other hand, I see a directory called etcd-old- * being created.
Etcd-old:
root@rke2-server-01:~# ll /var/lib/rancher/rke2/server/db/etcd-old-1652344994/
total 20
drwx------ 3 etcd users 4096 May 12 10:20 ./
drwxr-x--- 6 root root 4096 May 12 10:55 ../
-rw------- 1 etcd users 1093 May 12 10:20 config
drwx------ 4 etcd users 4096 May 12 10:20 member/
-rw------- 1 etcd users 23 May 12 10:20 name
Used command:
rke2 server --cluster-reset --cluster-reset-restore-path=/opt/etcd-snapshot-rke2-server-01-1652220000
Log file:
INFO[1230] Failed to test data store connection: context deadline exceeded
INFO[1233] Cluster-Http-Server 2022/05/12 11:14:27 http: TLS handshake error from 127.0.0.1:39416: remote error: tls: bad certificate
INFO[1233] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
{"level":"warn","ts":"2022-05-12T11:14:29.919+0200","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"passthrough:///https://127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\""}
INFO[1237] Failed to set etcd role label: an error on the server ("") has prevented the request from succeeding (get customresourcedefinitions.apiextensions.k8s.io)
INFO[1238] Cluster-Http-Server 2022/05/12 11:14:32 http: TLS handshake error from 127.0.0.1:39482: remote error: tls: bad certificate
INFO[1238] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
INFO[1243] Cluster-Http-Server 2022/05/12 11:14:37 http: TLS handshake error from 127.0.0.1:39554: remote error: tls: bad certificate
INFO[1243] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
{"level":"warn","ts":"2022-05-12T11:14:39.920+0200","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"passthrough:///https://127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\""}
{"level":"warn","ts":"2022-05-12T11:14:40.194+0200","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"passthrough:///https://127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\""}
INFO[1245] Failed to test data store connection: context deadline exceeded
INFO[1247] Failed to set etcd role label: an error on the server ("") has prevented the request from succeeding (get customresourcedefinitions.apiextensions.k8s.io)
INFO[1248] Cluster-Http-Server 2022/05/12 11:14:42 http: TLS handshake error from 127.0.0.1:39630: remote error: tls: bad certificate
INFO[1248] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
INFO[1253] Cluster-Http-Server 2022/05/12 11:14:47 http: TLS handshake error from 127.0.0.1:39706: remote error: tls: bad certificate
INFO[1253] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
{"level":"warn","ts":"2022-05-12T11:14:49.921+0200","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"passthrough:///https://127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\""}
INFO[1257] Failed to set etcd role label: an error on the server ("") has prevented the request from succeeding (get customresourcedefinitions.apiextensions.k8s.io)
INFO[1258] Cluster-Http-Server 2022/05/12 11:14:52 http: TLS handshake error from 127.0.0.1:39786: remote error: tls: bad certificate
INFO[1258] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
{"level":"warn","ts":"2022-05-12T11:14:55.195+0200","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"passthrough:///https://127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\""}
INFO[1260] Failed to test data store connection: context deadline exceeded
INFO[1263] Cluster-Http-Server 2022/05/12 11:14:57 http: TLS handshake error from 127.0.0.1:39830: remote error: tls: bad certificate
INFO[1263] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
{"level":"warn","ts":"2022-05-12T11:14:59.922+0200","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"passthrough:///https://127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\""}
INFO[1267] Failed to set etcd role label: an error on the server ("") has prevented the request from succeeding (get customresourcedefinitions.apiextensions.k8s.io)
INFO[1268] Cluster-Http-Server 2022/05/12 11:15:02 http: TLS handshake error from 127.0.0.1:39882: remote error: tls: bad certificate
INFO[1268] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
INFO[1273] Cluster-Http-Server 2022/05/12 11:15:07 http: TLS handshake error from 127.0.0.1:39928: remote error: tls: bad certificate
INFO[1273] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
{"level":"warn","ts":"2022-05-12T11:15:09.923+0200","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"passthrough:///https://127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\""}
{"level":"warn","ts":"2022-05-12T11:15:10.196+0200","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"passthrough:///https://127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\""}
INFO[1275] Failed to test data store connection: context deadline exceeded
Certs:
root@rke2-server-01:~# ll /var/lib/rancher/rke2/server/tls/
client-admin.crt client-ca.key client-kubelet.key client-rke2-controller.crt etcd/ service.key
client-admin.key client-controller.crt client-kube-proxy.crt client-rke2-controller.key request-header-ca.crt serving-kube-apiserver.crt
client-auth-proxy.crt client-controller.key client-kube-proxy.key client-scheduler.crt request-header-ca.key serving-kube-apiserver.key
client-auth-proxy.key client-kube-apiserver.crt client-rke2-cloud-controller.crt client-scheduler.key server-ca.crt serving-kubelet.key
client-ca.crt client-kube-apiserver.key client-rke2-cloud-controller.key dynamic-cert.json server-ca.key temporary-certs/
root@rke2-server-01:~# ll /var/lib/rancher/rke2/server/tls/etcd/
client.crt peer-ca.crt peer-server-client.crt server-ca.crt server-client.crt
client.key peer-ca.key peer-server-client.key server-ca.key server-client.key
Netstat:
root@rke2-server-01:~# netstat -l -n -v -p -t
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:6443 0.0.0.0:* LISTEN 4158636/rke2 server
tcp 0 0 127.0.0.1:6444 0.0.0.0:* LISTEN 4158636/rke2 server
tcp 0 0 127.0.0.1:10256 0.0.0.0:* LISTEN 4155254/kube-proxy
tcp 0 0 10.42.0.0:53 0.0.0.0:* LISTEN 3582490/named
tcp 0 0 10.98.110.143:53 0.0.0.0:* LISTEN 3582490/named
tcp 0 0 127.0.0.1:53 0.0.0.0:* LISTEN 3582490/named
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN 738/systemd-resolve
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 788/sshd: /usr/sbin
tcp 0 0 127.0.0.1:953 0.0.0.0:* LISTEN 3582490/named
tcp 0 0 127.0.0.1:10010 0.0.0.0:* LISTEN 4159324/containerd
tcp 0 0 127.0.0.1:10248 0.0.0.0:* LISTEN 4159345/kubelet
tcp 0 0 127.0.0.1:10249 0.0.0.0:* LISTEN 4155254/kube-proxy
tcp6 0 0 fe80::ecee:eeff:feee:53 :::* LISTEN 3582490/named
tcp6 0 0 fe80::250:56ff:fe95::53 :::* LISTEN 3582490/named
tcp6 0 0 ::1:53 :::* LISTEN 3582490/named
tcp6 0 0 :::22 :::* LISTEN 788/sshd: /usr/sbin
tcp6 0 0 ::1:953 :::* LISTEN 3582490/named
tcp6 0 0 :::9345 :::* LISTEN 4158636/rke2 server
tcp6 0 0 :::10250 :::* LISTEN 4159345/kubelet
If I start the rke2-server.service, the node will start without a problem.
Any help is welcome.