0

I have created a k8s cluster with RHEL7 with kubernetes packages GitVersion:"v1.8.1". I'm trying to deploy wordpress on my custom cluster. But pod creation is always stuck in ContainerCreating state.

phani@k8s-master]$ kubectl get pods --all-namespaces
NAMESPACE     NAME                                                        READY     STATUS              RESTARTS   AGE
default       wordpress-766d75457d-zlvdn                                  0/1       ContainerCreating   0          11m
kube-system   etcd-k8s-master                                             1/1       Running             0          1h
kube-system   kube-apiserver-k8s-master                                   1/1       Running             0          1h
kube-system   kube-controller-manager-k8s-master                          1/1       Running             0          1h
kube-system   kube-dns-545bc4bfd4-bb8js                                   3/3       Running             0          1h
kube-system   kube-proxy-bf4zr                                            1/1       Running             0          1h
kube-system   kube-proxy-d7zvg                                            1/1       Running             0          34m
kube-system   kube-scheduler-k8s-master                                   1/1       Running             0          1h
kube-system   weave-net-92zf9                                             2/2       Running             0          34m
kube-system   weave-net-sh7qk                                             2/2       Running             0          1h

Docker Version:1.13.1

Pod status from descibe command
      Normal   Scheduled               18m                default-scheduler                           Successfully assigned wordpress-766d75457d-zlvdn to worker1
      Normal   SuccessfulMountVolume   18m                kubelet, worker1                            MountVolume.SetUp succeeded for volume "default-token-tmpcm"
      Warning  DNSSearchForming        18m                kubelet, worker1                            Search Line limits were exceeded, some dns names have been omitted, the applied search line is: default.svc.cluster.local svc.cluster.local cluster.local 
      Warning  FailedCreatePodSandBox  14m                kubelet, worker1                            Failed create pod sandbox.
      Warning  FailedSync              25s (x8 over 14m)  kubelet, worker1                            Error syncing pod
      Normal   SandboxChanged          24s (x8 over 14m)  kubelet, worker1                            Pod sandbox changed, it will be killed and re-created.

from the kubelet log I observed below error on worker

error: failed to run Kubelet: failed to create kubelet: misconfiguration: kubelet cgroup driver: "cgroupfs" is different from docker cgroup driver: "systemd"

But kubelet is stable no problems seen on worker.

How do I solve this problem?

I checked the cni failure, I couldn't find anything.

~]# ls /opt/cni/bin
bridge  cnitool  dhcp  flannel  host-local  ipvlan  loopback  macvlan  noop  ptp  tuning  weave-ipam  weave-net  weave-plugin-2.3.0

In journal logs below messages are repetitively appeared . seems like scheduler is trying to create the container all the time.

Jun 08 11:25:22 worker1 kubelet[14339]: E0608 11:25:22.421184   14339 remote_runtime.go:115] StopPodSandbox "47da29873230d830f0ee21adfdd3b06ed0c653a0001c29289fe78446d27d2304" from runtime service failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
    Jun 08 11:25:22 worker1 kubelet[14339]: E0608 11:25:22.421212   14339 kuberuntime_manager.go:780] Failed to stop sandbox {"docker" "47da29873230d830f0ee21adfdd3b06ed0c653a0001c29289fe78446d27d2304"}
    Jun 08 11:25:22 worker1 kubelet[14339]: E0608 11:25:22.421247   14339 kuberuntime_manager.go:580] killPodWithSyncResult failed: failed to "KillPodSandbox" for "7f1c6bf1-6af3-11e8-856b-fa163e3d1891" with KillPodSandboxError: "rpc error: code = DeadlineExceeded desc = context deadline exceeded"
    Jun 08 11:25:22 worker1 kubelet[14339]: E0608 11:25:22.421262   14339 pod_workers.go:182] Error syncing pod 7f1c6bf1-6af3-11e8-856b-fa163e3d1891 ("wordpress-766d75457d-spdrb_default(7f1c6bf1-6af3-11e8-856b-fa163e3d1891)"), skipping: failed to "KillPodSandbox" for "7f1c6bf1-6af3-11e8-856b-fa163e3d1891" with KillPodSandboxError: "rpc error: code = DeadlineExceeded desc = context deadline exceeded"
phanikumar ch
  • 85
  • 1
  • 1
  • 13
  • The dup of [kubelet failed with kubelet cgroup driver: “cgroupfs” is different from docker cgroup driver: “systemd](https://stackoverflow.com/questions/45708175/kubelet-failed-with-kubelet-cgroup-driver-cgroupfs-is-different-from-docker-c) – Kun Li Jun 08 '18 at 04:17

5 Answers5

2

Failed create pod sandbox.

... is almost always a CNI failure; I would check on the node that all the weave containers are happy, and that /opt/cni/bin is present (or its weave equivalent)

You may have to check both the journalctl -u kubelet.service as well as the docker logs for any containers running to discover the full scope of the error on the node.

mdaniel
  • 31,240
  • 5
  • 55
  • 58
  • I checked the cni failure, I couldn't find anything. `~]# ls /opt/cni/bin bridge cnitool dhcp flannel host-local ipvlan loopback macvlan noop ptp tuning weave-ipam weave-net weave-plugin-2.3.0` – phanikumar ch Jun 08 '18 at 09:25
  • most likely you have more than one CNI configured, flannel and weave,check my answer. – elia Jun 08 '18 at 09:35
  • @Matthew I have checked the weave containers observed `connection shutting down due to error: cannot connect to ourself` and `[allocator]: Delete: no addresses for a66610849be48456a7ac0e823a45b639239a9a0ba6e26d305006a3fe5edc080f` . I couldn't figure out what is the problem. but the pod is repetitively crashing and trying to create new container. – phanikumar ch Jun 09 '18 at 21:50
1

It's seem to working by removing the$KUBELET_NETWORK_ARGS in /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

I have removed $KUBELET_NETWORK_ARGS and restarted the worker node then pods got deployed successfully.

phanikumar ch
  • 85
  • 1
  • 1
  • 13
  • this happens to be the solution on 1.11 as well but location of files is different: `$KUBELET_NETWORK_ARGS` flags are now in `/var/lib/kubelet/kubeadm-flags.env` – Const Aug 08 '18 at 12:23
0

As Matthew said it's most likely a CNI failure.

First, find the node this pod is running on:

kubectl get po wordpress-766d75457d-zlvdn -o wide 

Next in the node where the pod is located check /etc/cni/net.d if you have more than one .conf then you can delete one and restart the node.

source: https://github.com/kubernetes/kubeadm/issues/578.

note this is one of the solutions.

elia
  • 239
  • 3
  • 16
  • I have only one .conf file in /etc/cni/net.d `cat /etc/cni/net.d/10-weave.conf { "name": "weave", "type": "weave-net", "hairpinMode": true }` – phanikumar ch Jun 08 '18 at 09:41
  • sometimes this can be caused also by kubelet try restarting the node – elia Jun 08 '18 at 09:43
  • I have tried restarting of the node as well . But it didn't work. But observed below error relate weave-cni `weave-cni: unable to release IP address: Delete http://127.0.0.1:6784/ip/7ca5d651d2469e687d22fc2cd782ea3c7c19e8f6b2114738770d940a1be0884a: dial tcp 127.0.0.1:6784: connect: connection refused` Similar errors present more than once – phanikumar ch Jun 08 '18 at 14:48
  • @phanikumarch , could please share more details? Where did you find this log entry? How did you find the right log file and its location? Can this be fixed by commetn from https://github.com/kubernetes/kubeadm/issues/578#issuecomment-519159618 ? – HX_unbanned Aug 07 '19 at 15:57
0

While hopefully it's no one else's problem, for me, this happened when part of my filesystem was full.

I had pods stuck in ContainerCreating only on one node in my cluster. I also had a bunch of pods which I expected to shutdown, but hadn't. Someone recommended running

sudo systemctl status kubelet -l

which showed me a bunch of lines like

Jun 18 23:19:56 worker01 kubelet[1718]: E0618 23:19:56.461378 1718 kuberuntime_manager.go:647] createPodSandbox for pod "REDACTED(2c681b9c-cf5b-11eb-9c79-52540077cc53)" failed: mkdir /var/log/pods/2c681b9c-cf5b-11eb-9c79-52540077cc53: no space left on device

I confirmed that I was out of space with

$ df -h
Filesystem                    Size  Used Avail Use% Mounted on
devtmpfs                      189G     0  189G   0% /dev
tmpfs                         189G     0  189G   0% /sys/fs/cgroup
/dev/mapper/vg01-root          20G  7.0G   14G  35% /
/dev/mapper/vg01-tmp          4.0G   34M  4.0G   1% /tmp
/dev/mapper/vg01-home         4.0G   72M  4.0G   2% /home
/dev/mapper/vg01-varlog        10G   10G   20K 100% /var/log
/dev/mapper/vg01-varlogaudit  2.0G   68M  2.0G   4% /var/log/audit

I just had to clear out that dir (and did some manual cleanup on all the pending pods and pods that were stuck running).

jeremysprofile
  • 10,028
  • 4
  • 33
  • 53
0

In my case, my containers were stuck on ContainerCreating state but detailed message was a bit different. Failed to create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded.

But it was also related to CNI failure. My flannel pod was operating abnormally and I needed to reboot it and rejoin the node into cluster.

Hope this can help someone.

cointreau
  • 864
  • 1
  • 10
  • 21