RKE2 + Cilium leads to DNS issues

Question

Hi I have a little RKE2 cluster on my Windows Data Center 2022 using Hyper-V on Rocky Linux 8.8. I am attempting to replace kube-proxy with cilium, following this guide

I have ensured that KubeProxyReplacement is set to Strict on the Cilium chart and everything works, only when there is 1 node. Shortly after I join another node, I can no longer resolve my nginx service using curl or nslookup in a pod on the cluster. Sometimes it resolves (in like 10 seconds), other times it exits with 6: Couldn't resolve host

There are no kube-proxy pods (or a DaemonSet), and I do not even see an attempt to query the coredns pod. I checked that the resolv.conf pod (where I issue the curl command) has the correct nameserver (of the coredns service). The expected behavior is this resolution should always be near instantaneous in the cluster.

Furthermore, if I use kubectl rollout to restart the coredns deployment queries return quickly for a short period of time as well, but then start acting sporadic. Doing an nslookup within a pod typically results in a timeout error to the coredns service IP, even though it will respond with the first answer right away but then continue searching?

What am I doing wrong slash where can I look for more information? Thanks!

UPDATE

I enabled the log module in coredns, not seeing any requests/queries. So there appears to be some issue where the coredns service is not even being queried when initiating an HTTP request.

Update

This was a very strange issue - RKE2 deploys coredns as a Deployment, does not mark the controlplane as unschedulable, and when I used the Pod IP of the non controlplane coredns Pod it returned immediately, but when I used the controlplane one it timed out. I cordoned the controlplane and restarted the coredns deployment and things are now good.

Did you try to trace the DNS request with Hubble or tcpdump to see where it is dropped? Did you check if things work when using kube-proxy? — pchaigno, Aug 23 '23 at 22:23

RKE2 + Cilium leads to DNS issues

0 Answers0