2

I have a Kubernetes Cluster in an on-premise server, I also have a server on Naver Cloud lets call it server A, I want to join my server A to my Kubernetes Cluster, the server can join normally, but the kube-proxy and kube-flannel pods spawned from daemonset are constantly in CrashLoopBackOff status

here is the log from kube-proxy

I0405 03:13:48.566285       1 node.go:163] Successfully retrieved node IP: 10.1.0.2
I0405 03:13:48.566382       1 server_others.go:109] "Detected node IP" address="10.1.0.2"
I0405 03:13:48.566420       1 server_others.go:535] "Using iptables proxy"
I0405 03:13:48.616989       1 server_others.go:176] "Using iptables Proxier"
I0405 03:13:48.617021       1 server_others.go:183] "kube-proxy running in dual-stack mode" ipFamily=IPv4
I0405 03:13:48.617040       1 server_others.go:184] "Creating dualStackProxier for iptables"
I0405 03:13:48.617063       1 server_others.go:465] "Detect-local-mode set to ClusterCIDR, but no IPv6 cluster CIDR defined, , defaulting to no-op detect-local for IPv6"
I0405 03:13:48.617093       1 proxier.go:242] "Setting route_localnet=1 to allow node-ports on localhost; to change this either disable iptables.localhostNodePorts (--iptables-localhost-nodeports) or set nodePortAddresses (--nodeport-addresses) to filter loopback addresses"
I0405 03:13:48.617420       1 server.go:655] "Version info" version="v1.26.0"
I0405 03:13:48.617435       1 server.go:657] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
I0405 03:13:48.618790       1 conntrack.go:52] "Setting nf_conntrack_max" nf_conntrack_max=131072

there is no log from kube-flannel, kube-flannel pods failed on its Init containers named install-cni-plugin, when I try kubectl -n kube-flannel logs kube-flannel-ds-d2l4q -c install-cni-plugin it returns

unable to retrieve container logs for docker://47e4c8c580474b384b128c8e4d74297a0e891b5f227c6313146908b06ee7b376

I have no other clue that I can think of, please tell me if I need to attach more info

Please help, I've been stuck for so long T.T

More info:

kubectl get nodes

NAME                      STATUS     ROLES           AGE   VERSION
accio-randi-ed05937533    Ready      <none>          8d    v1.26.3
accio-test-1-b3fb4331ee   NotReady   <none>          89m   v1.26.3
master                    Ready      control-plane   48d   v1.26.1

kubectl -n kube-system get pods

NAME                             READY   STATUS             RESTARTS         AGE
coredns-787d4945fb-rms6t         1/1     Running            0                30d
coredns-787d4945fb-t6g8s         1/1     Running            0                33d
etcd-master                      1/1     Running            168 (36d ago)    48d
kube-apiserver-master            1/1     Running            158 (36d ago)    48d
kube-controller-manager-master   1/1     Running            27 (6d17h ago)   48d
kube-proxy-2r8tn                 1/1     Running            6 (36d ago)      48d
kube-proxy-f997t                 0/1     CrashLoopBackOff   39 (90s ago)     87m
kube-proxy-wc9x5                 1/1     Running            0                8d
kube-scheduler-master            1/1     Running            27 (6d17h ago)   48d

kubectl -n kube-system get events

LAST SEEN   TYPE      REASON             OBJECT                               MESSAGE
42s         Warning   DNSConfigForming   pod/coredns-787d4945fb-rms6t         Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.8.0.1 192.168.18.1 fe80::1%3
54s         Warning   DNSConfigForming   pod/coredns-787d4945fb-t6g8s         Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.8.0.1 192.168.18.1 fe80::1%3
3m10s       Warning   DNSConfigForming   pod/etcd-master                      Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.8.0.1 192.168.18.1 fe80::1%3
2m48s       Warning   DNSConfigForming   pod/kube-apiserver-master            Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.8.0.1 192.168.18.1 fe80::1%3
3m33s       Warning   DNSConfigForming   pod/kube-controller-manager-master   Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.8.0.1 192.168.18.1 fe80::1%3
3m7s        Warning   DNSConfigForming   pod/kube-proxy-2r8tn                 Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.8.0.1 192.168.18.1 fe80::1%3
15s         Normal    SandboxChanged     pod/kube-proxy-f997t                 Pod sandbox changed, it will be killed and re-created.
5m15s       Warning   BackOff            pod/kube-proxy-f997t                 Back-off restarting failed container kube-proxy in pod kube-proxy-f997t_kube-system(7652a1c4-9517-4a8a-a736-1f746f36c7ab)
3m30s       Warning   DNSConfigForming   pod/kube-scheduler-master            Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.8.0.1 192.168.18.1 fe80::1%3

kubectl -n kube-flannel get pods

NAME                    READY   STATUS                  RESTARTS      AGE
kube-flannel-ds-2xgbw   1/1     Running                 0             8d
kube-flannel-ds-htgts   0/1     Init:CrashLoopBackOff   0 (2s ago)    88m
kube-flannel-ds-sznbq   1/1     Running                 6 (36d ago)   48d

kubectl -n kube-flannel get events

LAST SEEN   TYPE      REASON             OBJECT                      MESSAGE
100s        Normal    SandboxChanged     pod/kube-flannel-ds-htgts   Pod sandbox changed, it will be killed and re-created.
26m         Normal    Pulled             pod/kube-flannel-ds-htgts   Container image "docker.io/flannel/flannel-cni-plugin:v1.1.2" already present on machine
46m         Warning   BackOff            pod/kube-flannel-ds-htgts   Back-off restarting failed container install-cni-plugin in pod kube-flannel-ds-htgts_kube-flannel(4f602997-5502-4dcf-8fca-23eba01325dd)
5m          Warning   DNSConfigForming   pod/kube-flannel-ds-sznbq   Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.8.0.1 192.168.18.1 fe80::1%3

kubectl -n kube-flannel describe pod kube-flannel-ds-htgts

Name:                 kube-flannel-ds-htgts
Namespace:            kube-flannel
Priority:             2000001000
Priority Class Name:  system-node-critical
Service Account:      flannel
Node:                 accio-test-1-b3fb4331ee/10.1.0.2
Start Time:           Thu, 06 Apr 2023 09:25:12 +0900
Labels:               app=flannel
                      controller-revision-hash=6b7b59d784
                      k8s-app=flannel
                      pod-template-generation=1
                      tier=node
Annotations:          <none>
Status:               Pending
IP:                   10.1.0.2
IPs:
  IP:           10.1.0.2
Controlled By:  DaemonSet/kube-flannel-ds
Init Containers:
  install-cni-plugin:
    Container ID:  docker://0fed30cc41f305203bf5d6fb7668f92f449a65f722faf1360e61231e9107ef66
    Image:         docker.io/flannel/flannel-cni-plugin:v1.1.2
    Image ID:      docker-pullable://flannel/flannel-cni-plugin@sha256:bf4b62b131666d040f35a327d906ee5a3418280b68a88d9b9c7e828057210443
    Port:          <none>
    Host Port:     <none>
    Command:
      cp
    Args:
      -f
      /flannel
      /opt/cni/bin/flannel
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 06 Apr 2023 15:11:34 +0900
      Finished:     Thu, 06 Apr 2023 15:11:34 +0900
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /opt/cni/bin from cni-plugin (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-gbk6z (ro)
  install-cni:
    Container ID:
    Image:         docker.io/flannel/flannel:v0.21.0
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Command:
      cp
    Args:
      -f
      /etc/kube-flannel/cni-conf.json
      /etc/cni/net.d/10-flannel.conflist
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /etc/cni/net.d from cni (rw)
      /etc/kube-flannel/ from flannel-cfg (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-gbk6z (ro)
Containers:
  kube-flannel:
    Container ID:
    Image:         docker.io/flannel/flannel:v0.21.0
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Command:
      /opt/bin/flanneld
    Args:
      --ip-masq
      --kube-subnet-mgr
      --iface=accio-k8s-net
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:     100m
      memory:  50Mi
    Environment:
      POD_NAME:                 kube-flannel-ds-htgts (v1:metadata.name)
      POD_NAMESPACE:            kube-flannel (v1:metadata.namespace)
      KUBERNETES_SERVICE_HOST:  10.1.0.1
      KUBERNETES_SERVICE_PORT:  6443
      EVENT_QUEUE_DEPTH:        5000
    Mounts:
      /etc/kube-flannel/ from flannel-cfg (rw)
      /run/flannel from run (rw)
      /run/xtables.lock from xtables-lock (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-gbk6z (ro)
Conditions:
  Type              Status
  Initialized       False
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  run:
    Type:          HostPath (bare host directory volume)
    Path:          /run/flannel
    HostPathType:
  cni-plugin:
    Type:          HostPath (bare host directory volume)
    Path:          /opt/cni/bin
    HostPathType:
  cni:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/cni/net.d
    HostPathType:
  flannel-cfg:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      kube-flannel-cfg
    Optional:  false
  xtables-lock:
    Type:          HostPath (bare host directory volume)
    Path:          /run/xtables.lock
    HostPathType:  FileOrCreate
  kube-api-access-gbk6z:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 :NoSchedule op=Exists
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason          Age                      From     Message
  ----     ------          ----                     ----     -------
  Warning  BackOff         31m (x8482 over 5h46m)   kubelet  Back-off restarting failed container install-cni-plugin in pod kube-flannel-ds-htgts_kube-flannel(4f602997-5502-4dcf-8fca-23eba01325dd)
  Normal   Created         21m (x8783 over 5h46m)   kubelet  Created container install-cni-plugin
  Normal   Pulled          11m (x9051 over 5h46m)   kubelet  Container image "docker.io/flannel/flannel-cni-plugin:v1.1.2" already present on machine
  Normal   SandboxChanged  81s (x18656 over 5h46m)  kubelet  Pod sandbox changed, it will be killed and re-created.
  • Can you run these commnads **kubectl get pods** and **kubectl get events** and update the question. Also try a quick workaround if a cluster is needed fast: Manually set the parameter with **sudo sysctl net/netfilter/nf_conntrack_max=131072** before creating the Kind cluster. – Veera Nagireddy Apr 05 '23 at 06:03
  • Hello @Rahandi Noor Pasha, Feel free to update the status of the question. Let me know the answer below helps to resolve your issue? I am happy to help you if you have any further queries. – Veera Nagireddy Apr 12 '23 at 04:21

2 Answers2

3

I had a similar issue in one of my nodes due to the container runtime being incorrectly configured. Please check the containerd configuration located at /etc/containerd/config.toml for specifying daemon level options. The default configuration can be generated by running

containerd config default > /etc/containerd/config.toml

To use the systemd cgroup driver in /etc/containerd/config.toml with runc, set

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
  ...
 [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
    SystemdCgroup = true

If the cgroup driver is incorrect, that might lead to the pod in that node always being in a CrashLoopBackOff.

Mark Woon
  • 2,046
  • 1
  • 20
  • 26
hyunfei
  • 31
  • 3
0

Based on the point of Crash which might happen soon at startup or during execution of your application, you may not always see the logs.

If logs are not showing, it can be considered that the pod may not have some requested resources available. It may be a secret or a volume.

You can get complete details about resources and relative events by executing the below commands, which helps you to understand and resolve your issue quickly.

A.kubectl get events

B.kubectl describe pod <pod_name>

C.kubectl get pods

Check below Possible reasons for Crashloopbackoff in your case:

  1. It seems the kernel doesn't allow setting some conntrack fields from non-init netns. By default kube-proxy tries to set them, hence failing and the pods causing Crashloopbackoff. You can configure kube-proxy to net try to set these values in kubeadm.

    a. Delete your local cluster first

    b. Set sudo sysctl net/netfilter/nf_conntrack_max=131072

    c. Start a new local cluster again

  2. Pods cannot always communicate with ClusterIPs. Check kube-proxy iptables masqueradeAll: false by default. The value might have been set to true by mistake. Refer to Github issue #2849 for details.

  3. Check if the subnet of your cluster was set to different than the subnet in the yaml file of the flannel manifest and in such case you can change subnet in yaml of flannel configuration to the same as it was applied when cluster was init-ed. In addition you can also refer to Edgar Huynh's response in the post Kube flannel in CrashLoopBackOff status, which may help to resolve your issue.

EDIT :

Based on the Events which you provided :

Check you may have too many nameservers in /etc/resolv.conf on the node, not in the ClusterDNS config. To resolve DNSConfigForming issue refer to Official kubernetes document on DNS known issues & K8s coredns and flannel nameserver limit exceeded and also refer to Kubernetes community forum General discussion on Why does etcd fail with Debian/bullseye kernel?, which may help to resolve your issue.

Veera Nagireddy
  • 1,656
  • 1
  • 3
  • 12