3

I have a Rancher (RKE2) cluster, where I want to restore the previous etcd snapshot. I followed the (official description) but it doesn't work for me. The process gets stuck in an infinite loop. On the other hand, I see a directory called etcd-old- * being created.

Etcd-old:

root@rke2-server-01:~# ll /var/lib/rancher/rke2/server/db/etcd-old-1652344994/
total 20
drwx------ 3 etcd users 4096 May 12 10:20 ./
drwxr-x--- 6 root root  4096 May 12 10:55 ../
-rw------- 1 etcd users 1093 May 12 10:20 config
drwx------ 4 etcd users 4096 May 12 10:20 member/
-rw------- 1 etcd users   23 May 12 10:20 name

Used command:

rke2 server --cluster-reset --cluster-reset-restore-path=/opt/etcd-snapshot-rke2-server-01-1652220000

Log file:

INFO[1230] Failed to test data store connection: context deadline exceeded
INFO[1233] Cluster-Http-Server 2022/05/12 11:14:27 http: TLS handshake error from 127.0.0.1:39416: remote error: tls: bad certificate
INFO[1233] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
{"level":"warn","ts":"2022-05-12T11:14:29.919+0200","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"passthrough:///https://127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\""}
INFO[1237] Failed to set etcd role label: an error on the server ("") has prevented the request from succeeding (get customresourcedefinitions.apiextensions.k8s.io)
INFO[1238] Cluster-Http-Server 2022/05/12 11:14:32 http: TLS handshake error from 127.0.0.1:39482: remote error: tls: bad certificate
INFO[1238] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
INFO[1243] Cluster-Http-Server 2022/05/12 11:14:37 http: TLS handshake error from 127.0.0.1:39554: remote error: tls: bad certificate
INFO[1243] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
{"level":"warn","ts":"2022-05-12T11:14:39.920+0200","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"passthrough:///https://127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\""}
{"level":"warn","ts":"2022-05-12T11:14:40.194+0200","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"passthrough:///https://127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\""}
INFO[1245] Failed to test data store connection: context deadline exceeded
INFO[1247] Failed to set etcd role label: an error on the server ("") has prevented the request from succeeding (get customresourcedefinitions.apiextensions.k8s.io)
INFO[1248] Cluster-Http-Server 2022/05/12 11:14:42 http: TLS handshake error from 127.0.0.1:39630: remote error: tls: bad certificate
INFO[1248] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
INFO[1253] Cluster-Http-Server 2022/05/12 11:14:47 http: TLS handshake error from 127.0.0.1:39706: remote error: tls: bad certificate
INFO[1253] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
{"level":"warn","ts":"2022-05-12T11:14:49.921+0200","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"passthrough:///https://127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\""}
INFO[1257] Failed to set etcd role label: an error on the server ("") has prevented the request from succeeding (get customresourcedefinitions.apiextensions.k8s.io)
INFO[1258] Cluster-Http-Server 2022/05/12 11:14:52 http: TLS handshake error from 127.0.0.1:39786: remote error: tls: bad certificate
INFO[1258] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
{"level":"warn","ts":"2022-05-12T11:14:55.195+0200","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"passthrough:///https://127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\""}
INFO[1260] Failed to test data store connection: context deadline exceeded
INFO[1263] Cluster-Http-Server 2022/05/12 11:14:57 http: TLS handshake error from 127.0.0.1:39830: remote error: tls: bad certificate
INFO[1263] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
{"level":"warn","ts":"2022-05-12T11:14:59.922+0200","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"passthrough:///https://127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\""}
INFO[1267] Failed to set etcd role label: an error on the server ("") has prevented the request from succeeding (get customresourcedefinitions.apiextensions.k8s.io)
INFO[1268] Cluster-Http-Server 2022/05/12 11:15:02 http: TLS handshake error from 127.0.0.1:39882: remote error: tls: bad certificate
INFO[1268] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
INFO[1273] Cluster-Http-Server 2022/05/12 11:15:07 http: TLS handshake error from 127.0.0.1:39928: remote error: tls: bad certificate
INFO[1273] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
{"level":"warn","ts":"2022-05-12T11:15:09.923+0200","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"passthrough:///https://127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\""}
{"level":"warn","ts":"2022-05-12T11:15:10.196+0200","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"passthrough:///https://127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\""}
INFO[1275] Failed to test data store connection: context deadline exceeded

Certs:

root@rke2-server-01:~# ll /var/lib/rancher/rke2/server/tls/
client-admin.crt                  client-ca.key                     client-kubelet.key                client-rke2-controller.crt        etcd/                             service.key
client-admin.key                  client-controller.crt             client-kube-proxy.crt             client-rke2-controller.key        request-header-ca.crt             serving-kube-apiserver.crt
client-auth-proxy.crt             client-controller.key             client-kube-proxy.key             client-scheduler.crt              request-header-ca.key             serving-kube-apiserver.key
client-auth-proxy.key             client-kube-apiserver.crt         client-rke2-cloud-controller.crt  client-scheduler.key              server-ca.crt                     serving-kubelet.key
client-ca.crt                     client-kube-apiserver.key         client-rke2-cloud-controller.key  dynamic-cert.json                 server-ca.key                     temporary-certs/
root@rke2-server-01:~# ll /var/lib/rancher/rke2/server/tls/etcd/
client.crt              peer-ca.crt             peer-server-client.crt  server-ca.crt           server-client.crt
client.key              peer-ca.key             peer-server-client.key  server-ca.key           server-client.key

Netstat:

root@rke2-server-01:~#  netstat -l -n -v -p -t
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 127.0.0.1:6443          0.0.0.0:*               LISTEN      4158636/rke2 server
tcp        0      0 127.0.0.1:6444          0.0.0.0:*               LISTEN      4158636/rke2 server
tcp        0      0 127.0.0.1:10256         0.0.0.0:*               LISTEN      4155254/kube-proxy
tcp        0      0 10.42.0.0:53            0.0.0.0:*               LISTEN      3582490/named
tcp        0      0 10.98.110.143:53        0.0.0.0:*               LISTEN      3582490/named
tcp        0      0 127.0.0.1:53            0.0.0.0:*               LISTEN      3582490/named
tcp        0      0 127.0.0.53:53           0.0.0.0:*               LISTEN      738/systemd-resolve
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      788/sshd: /usr/sbin
tcp        0      0 127.0.0.1:953           0.0.0.0:*               LISTEN      3582490/named
tcp        0      0 127.0.0.1:10010         0.0.0.0:*               LISTEN      4159324/containerd
tcp        0      0 127.0.0.1:10248         0.0.0.0:*               LISTEN      4159345/kubelet
tcp        0      0 127.0.0.1:10249         0.0.0.0:*               LISTEN      4155254/kube-proxy
tcp6       0      0 fe80::ecee:eeff:feee:53 :::*                    LISTEN      3582490/named
tcp6       0      0 fe80::250:56ff:fe95::53 :::*                    LISTEN      3582490/named
tcp6       0      0 ::1:53                  :::*                    LISTEN      3582490/named
tcp6       0      0 :::22                   :::*                    LISTEN      788/sshd: /usr/sbin
tcp6       0      0 ::1:953                 :::*                    LISTEN      3582490/named
tcp6       0      0 :::9345                 :::*                    LISTEN      4158636/rke2 server
tcp6       0      0 :::10250                :::*                    LISTEN      4159345/kubelet

If I start the rke2-server.service, the node will start without a problem.

Any help is welcome.

AA AA
  • 31
  • 1
  • 2

0 Answers0