I set up a single-node Kubernetes cluster, using kubeadm, on Ubuntu 16.04 LTS with flannel.
Most of the time everything works well, but every couple of days, the cluster gets into a state where it can't schedule new pods - the pods are stuck in "Pending" state and When I kubectl describe pod
of those pods, I error messages like these:
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
2m 2m 1 {default-scheduler } Normal Scheduled Successfully assigned dex-1939802596-zt1r3 to superserver-03
1m 2s 21 {kubelet superserver-03} Warning FailedSync Error syncing pod, skipping: failed to "SetupNetwork" for "somepod-1939802596-zt1r3_somenamespace" with SetupNetworkError: "Failed to setup network for pod \"somepod-1939802596-zt1r3_somenamespace(167f8345-faeb-11e6-94f3-0cc47a9a5cf2)\" using network plugins \"cni\": no IP addresses available in network: cbr0; Skipping pod"
I've found this stackoverflow question and the workaround he's suggested. It does help to recover (it takes a several minutes though), but the problem comes back after a while...
I've also encountered this open issue, and also got the issue recovered using the suggested workaround, but again, the problem comes back. Also, it's not exactly my case, and the issue was closed after just finding a workaround... :\
Technical details:
kubeadm version: version.Info{Major:"1", Minor:"6+", GitVersion:"v1.6.0-alpha.0.2074+a092d8e0f95f52", GitCommit:"a092d8e0f95f5200f7ae2cba45c75ab42da36537", GitTreeState:"clean", BuildDate:"2016-12-13T17:03:18Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}
Kubernetes Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.3", GitCommit:"029c3a408176b55c30846f0faedf56aae5992e9b", GitTreeState:"clean", BuildDate:"2017-02-15T06:34:56Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}
Started the cluster with these commands:
kubeadm init --pod-network-cidr 10.244.0.0/16 --api-advertise-addresses 192.168.1.200
kubectl taint nodes --all dedicated-
kubectl -n kube-system apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
Some syslog logs that may be relevant (I got many of those):
Feb 23 11:07:49 server-03 kernel: [ 155.480669] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
Feb 23 11:07:49 server-03 dockerd[1414]: time="2017-02-23T11:07:49.735590817+02:00" level=warning msg="Couldn't run auplink before unmount /var/lib/docker/aufs/mnt/89bb7abdb946d858e175d80d6e1d2fdce0262af8c7afa9c6ad9d776f1f5028c4-init: exec: \"auplink\": executable file not found in $PATH"
Feb 23 11:07:49 server-03 kernel: [ 155.496599] aufs au_opts_verify:1597:dockerd[24704]: dirperm1 breaks the protection by the permission bits on the lower branch
Feb 23 11:07:49 server-03 systemd-udevd[29313]: Could not generate persistent MAC address for vethd4d85eac: No such file or directory
Feb 23 11:07:49 server-03 kubelet[1228]: E0223 11:07:49.756976 1228 cni.go:255] Error adding network: no IP addresses available in network: cbr0
Feb 23 11:07:49 server-03 kernel: [ 155.514994] IPv6: eth0: IPv6 duplicate address fe80::835:deff:fe4f:c74d detected!
Feb 23 11:07:49 server-03 kernel: [ 155.515380] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Feb 23 11:07:49 server-03 kernel: [ 155.515588] device vethd4d85eac entered promiscuous mode
Feb 23 11:07:49 server-03 kernel: [ 155.515643] cni0: port 34(vethd4d85eac) entered forwarding state
Feb 23 11:07:49 server-03 kernel: [ 155.515663] cni0: port 34(vethd4d85eac) entered forwarding state
Feb 23 11:07:49 server-03 kubelet[1228]: E0223 11:07:49.757001 1228 cni.go:209] Error while adding to cni network: no IP addresses available in network: cbr0
Feb 23 11:07:49 server-03 kubelet[1228]: E0223 11:07:49.757056 1228 docker_manager.go:2201] Failed to setup network for pod "somepod-752955044-58g59_somenamespace(5d6c28e1-f8dd-11e6-9843-0cc47a9a5cf2)" using network plugins "cni": no IP addresses available in network: cbr0; Skipping pod
Many thanks!
Edit:
I am able to reproduce it. It seems like it is an exhaust of the IP addresses in the kubelet CIDR. Findings:
First, the podCIDR of the node is (got it through
kubectl get node -o yaml
):podCIDR: 10.244.0.0/24
(BTW, why not /16 as the cluster CIDR I've set in the kubeadm commnad?).Second:
$ sudo ls -la /var/lib/cni/networks/cbr0 | wc -l
256
(that is, 256 IPs are assigned, right?)But, that happens although I currently have no more than 256 running Kubernetes pods and services:
$ kubectl get all --all-namespaces | wc -l
180
### (Yes, this includes not only pods and services, but also jobs, deployments and replicasets)
So, home comes the IP addresses are exhausted? How to fix that? It can't be that those workarounds are the only ways...
Thanks again.
Edit (2)
Another related issue: https://github.com/containernetworking/cni/issues/306