k3s - Metrics server doesn't work for worker nodes

Question

I deployed a k3s cluster into 2 raspberry pi 4. One as a master and the second as a worker using the script k3s offered with the following options:

For the master node:

curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC='server --bind-address 192.168.1.113 (which is the master node ip)' sh -

To the agent node:

curl -sfL https://get.k3s.io | \
                K3S_URL=https://192.168.1.113:6443 \
                K3S_TOKEN=<master-token> \
                INSTALL_K3S_EXEC='agent' sh-

Everything seems to work, but kubectl top nodes returns the following:

NAME          CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%     
k3s-master    137m         3%     1285Mi          33%         
k3s-node-01   <unknown>                           <unknown>               <unknown>               <unknown>

I also tried to deploy the k8s dashboard, according to what is written in the docs but it fails to work because it can't reach the metrics server and gets a timeout error:

"error trying to reach service: dial tcp 10.42.1.11:8443: i/o timeout"

and I see a lot of errors in the pod logs:

2021/09/17 09:24:06 Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.
2021/09/17 09:25:06 Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.
2021/09/17 09:26:06 Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.
2021/09/17 09:27:06 Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.

logs from the metrics-server pod:

elet_summary:k3s-node-01: unable to fetch metrics from Kubelet k3s-node-01 (k3s-node-01): Get https://k3s-node-01:10250/stats/summary?only_cpu_and_memory=true: dial tcp 192.168.1.106:10250: connect: no route to host
E0917 14:03:24.767949       1 manager.go:111] unable to fully collect metrics: unable to fully scrape metrics from source kubelet_summary:k3s-node-01: unable to fetch metrics from Kubelet k3s-node-01 (k3s-node-01): Get https://k3s-node-01:10250/stats/summary?only_cpu_and_memory=true: dial tcp 192.168.1.106:10250: connect: no route to host
E0917 14:04:24.767960       1 manager.go:111] unable to fully collect metrics: unable to fully scrape metrics from source kubelet_summary:k3s-node-01: unable to fetch metrics from Kubelet k3s-node-01 (k3s-node-01): Get https://k3s-node-01:10250/stats/summary?only_cpu_and_memory=true: dial tcp 192.168.1.106:10250: connect: no route to host

I wasn't able to reproduce this behaviour (not on raspberry but on ubuntu VMs), after some time passed, worker node get metrics as well. I see that your commands to install a bit different to [documentation says](https://rancher.com/docs/k3s/latest/en/quick-start/#install-script). You can also try to restart the metric server by `k3s kubectl rollout restart deploy metrics-server -n kube-system` + check logs in `metrics-server` pod. — moonkotte, Sep 17 '21 at 12:31
thanks, added logs from metrics-server pod, and it seens like it's looking in the wrong IP of the node? — Assaf Sapir, Sep 17 '21 at 14:07
There's something wrong set up in the network. Can you ping by `hostname` your worker node? Check `/etc/hosts` if there's an entry or try to add it with the correct IP. — moonkotte, Sep 17 '21 at 14:21
after fixing `/etc/hosts` and rolling, I still see the same errors of `dial tcp: i/o timeout` — Assaf Sapir, Sep 17 '21 at 15:35
Well, this is a different error. This time it looks like your network see the another host. 1 - Can you ping/curl another host from the system? Does it work? 2 - Check `sudo netstat -tulpn` on worker node, does it listen on 10250? 3 - Are any firewalls on hosts? If so, disable it for test. — moonkotte, Sep 20 '21 at 08:06
I re-provisioned the cluster and everything seems to be working now. Part of the problem was ntp not working so I had cert issues. — Assaf Sapir, Sep 21 '21 at 11:11

score 1 · Accepted Answer · answered Sep 22 '21 at 13:52

Moving this out of comments for better visibility.

After creation of small cluster, I wasn't able to reproduce this behaviour and metrics-server worked fine for both nodes, kubectl top nodes showed information and metrics about both available nodes (thought it took some time to start collecting the metrics).

Which leads to troubleshooting steps why it doesn't work. Checking metrics-server logs is the most efficient way to figure this out:

$ kubectl logs metrics-server-58b44df574-2n9dn -n kube-system

Based on logs it will be different steps to continue, for instance in comments above:

first it was no route to host which is related to network and lack of possibility to resolve hostname
then i/o timeout which means route exists, but service did not respond back. This may happen due to firewall which blocks certain ports/sources, kubelet is not running (listens to port 10250) or as it appeared for OP, there was an issue with ntp which affected certificates and connections.
errors may be different in other cases, it's important to find the error and based on it troubleshoot further.

k3s - Metrics server doesn't work for worker nodes

1 Answers1