4

I am trying to deploy my sample Spring Boot micro service into Kubernetes cluster. My every nodes are showing ready state. And when I am trying to deploy, my pod is only showing ContainerCreating.

And when I am describing the pod, then I am getting the message by saying networkPlugin cni failed to set up pod and network unable to allocate IP address.

My pod describe command result like the following:

Events:
 Type     Reason                  Age                    From                   Message
  ----     ------                  ----                   ----                   -------
  Normal   Scheduled               <unknown>              default-scheduler      Successfully assigned 
default/spacestudysecurityauthcontrol-deployment-57596f4795-jxxvj to mildevkub040
  Warning  FailedCreatePodSandBox  53m                    kubelet, mildevkub040  Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "2499f91b4a1173fb854a47ba1910d1fc3f18cfb35bf5c38c9a3008e19d385e15" network for pod "spacestudysecurityauthcontrol-deployment-57596f4795-jxxvj": networkPlugin cni failed to set up pod "spacestudysecurityauthcontrol-deployment-57596f4795-jxxvj_default" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/2499f91b4a1173fb854a47ba1910d1fc3f18cfb35bf5c38c9a3008e19d385e15: dial tcp 127.0.0.1:6784: connect: connection refused, failed to clean up sandbox container "2499f91b4a1173fb854a47ba1910d1fc3f18cfb35bf5c38c9a3008e19d385e15" network for pod "spacestudysecurityauthcontrol-deployment-57596f4795-jxxvj": networkPlugin cni failed to teardown pod "spacestudysecurityauthcontrol-deployment-57596f4795-jxxvj_default" network: Delete http://127.0.0.1:6784/ip/2499f91b4a1173fb854a47ba1910d1fc3f18cfb35bf5c38c9a3008e19d385e15: dial tcp 127.0.0.1:6784: connect: connection refused]
  Normal   SandboxChanged          3m40s (x228 over 53m)  kubelet, mildevkub040  Pod sandbox changed, it will be killed and re-created.

When I am checking the container weave log I am getting like the following,

INFO: 2020/01/09 12:18:12.061328 ->[192.168.16.178:42838] connection shutting down due to error during handshake: write tcp 192.168.16.177:6783->192.168.16.178:42838: write: connection reset by peer
INFO: 2020/01/09 12:18:18.998360 ->[192.168.16.178:37570] connection accepted
INFO: 2020/01/09 12:18:20.653339 ->[192.168.16.178:45223] connection shutting down due to error during handshake: write tcp 192.168.16.177:6783->192.168.16.178:45223: write: connection reset by peer
INFO: 2020/01/09 12:18:21.122204 overlay_switch ->[56:60:12:a9:76:d1(mildevkub050)] using fastdp
INFO: 2020/01/09 12:18:21.742168 ->[192.168.16.178:6783|56:60:12:a9:76:d1(mildevkub050)]: connection deleted
INFO: 2020/01/09 12:18:21.800670 ->[192.168.16.178:6783] attempting connection
INFO: 2020/01/09 12:18:22.470207 ->[192.168.16.175:59923] connection accepted
INFO: 2020/01/09 12:18:22.912690 ->[192.168.16.175:6783|be:b1:3f:a4:34:88(mildevkub020)]: connection deleted
INFO: 2020/01/09 12:18:22.918075 Removed unreachable peer be:b1:3f:a4:34:88(mildevkub020)
INFO: 2020/01/09 12:18:22.918144 Removed unreachable peer 56:60:12:a9:76:d1(mildevkub050)
INFO: 2020/01/09 12:18:24.602093 ->[192.168.16.175:6783] attempting connection
INFO: 2020/01/09 12:18:26.782123 ->[192.168.16.178:6783|56:60:12:a9:76:d1(mildevkub050)]: connection ready; using protocol version 2
INFO: 2020/01/09 12:18:27.918518 ->[192.168.16.175:59923|be:b1:3f:a4:34:88(mildevkub020)]: connection ready; using protocol version 2
INFO: 2020/01/09 12:18:29.365629 ->[192.168.16.178:37570|56:60:12:a9:76:d1(mildevkub050)]: connection ready; using protocol version 2
INFO: 2020/01/09 12:18:29.864370 overlay_switch ->[56:60:12:a9:76:d1(mildevkub050)] using fastdp
INFO: 2020/01/09 12:18:30.086645 overlay_switch ->[56:60:12:a9:76:d1(mildevkub050)] using fastdp
INFO: 2020/01/09 12:18:30.090275 overlay_switch ->[be:b1:3f:a4:34:88(mildevkub020)] using fastdp
INFO: 2020/01/09 12:18:30.100874 ->[192.168.16.178:37570|56:60:12:a9:76:d1(mildevkub050)]: connection added (new peer)
INFO: 2020/01/09 12:18:30.104237 ->[192.168.16.178:37570|56:60:12:a9:76:d1(mildevkub050)]: connection deleted
INFO: 2020/01/09 12:18:30.104284 ->[192.168.16.178:6783|56:60:12:a9:76:d1(mildevkub050)]: connection added (new peer)
INFO: 2020/01/09 12:18:30.104371 ->[192.168.16.175:59923|be:b1:3f:a4:34:88(mildevkub020)]: connection added (new peer)
INFO: 2020/01/09 12:18:30.776275 ->[192.168.16.178:37570|56:60:12:a9:76:d1(mildevkub050)]: connection shutting down due to error: Multiple connections to 56:60:12:a9:76:d1(mildevkub050) added to 5a:67:92:b3:58:ce(mildevkub040)
INFO: 2020/01/09 12:18:44.305079 ->[192.168.16.175:6783|be:b1:3f:a4:34:88(mildevkub020)]: connection ready; using protocol version 2
INFO: 2020/01/09 12:18:45.200565 overlay_switch ->[be:b1:3f:a4:34:88(mildevkub020)] using fastdp
INFO: 2020/01/09 12:18:45.458203 ->[192.168.16.175:59923|be:b1:3f:a4:34:88(mildevkub020)]: connection fully established
INFO: 2020/01/09 12:18:45.461157 ->[192.168.16.175:6783|be:b1:3f:a4:34:88(mildevkub020)]: connection shutting down due to error: Multiple connections to be:b1:3f:a4:34:88(mildevkub020) added to 5a:67:92:b3:58:ce(mildevkub040)
INFO: 2020/01/09 12:18:45.470667 ->[192.168.16.178:6783|56:60:12:a9:76:d1(mildevkub050)]: connection fully established
INFO: 2020/01/09 12:18:45.688871 sleeve ->[192.168.16.178:6783|56:60:12:a9:76:d1(mildevkub050)]: Effective MTU verified at 1438
INFO: 2020/01/09 12:18:45.874380 sleeve ->[192.168.16.175:6783|be:b1:3f:a4:34:88(mildevkub020)]: Effective MTU verified at 1438
INFO: 2020/01/09 12:24:12.026645 ->[192.168.16.178:6783|56:60:12:a9:76:d1(mildevkub050)]: connection shutting down due to error: write tcp 192.168.16.177:38313->192.168.16.178:6783: write: connection reset by peer
INFO: 2020/01/09 12:25:56.708405 ->[192.168.16.178:44120] connection accepted
INFO: 2020/01/09 12:26:31.769826 overlay_switch ->[56:60:12:a9:76:d1(mildevkub050)] sleeve timed out waiting for UDP heartbeat
INFO: 2020/01/09 12:26:41.819554 ->[192.168.16.175:59923|be:b1:3f:a4:34:88(mildevkub020)]: connection shutting down due to error: write tcp 192.168.16.177:6783->192.168.16.175:59923: write: connection reset by peer
INFO: 2020/01/09 12:28:17.563133 ->[192.168.16.178:6783|56:60:12:a9:76:d1(mildevkub050)]: connection deleted
INFO: 2020/01/09 12:30:49.548347 ->[192.168.16.178:60937] connection accepted

When I am running the command kubectl exec -n kube-system weave-net-fj9mm -c weave -- /home/weave/weave --local status ipam , I am getting the response like "Error from server (NotFound): pods "weave-net-fj9mm" not found"

How I can resolve this issue?

Mr.DevEng
  • 2,651
  • 14
  • 57
  • 115
  • What CNI plugin are you using in your cluster? Are other deployments able to run? – Matt Jan 18 '20 at 10:24
  • what is the network plugin you are using? How did setup the cluster ? what is the pod cidr that you have specified while setting up the cluster. – Arghya Sadhu Jan 18 '20 at 10:26
  • @ArghyaSadhu - weave I am using. I setuped cluster with kubeadm tool.This error happened now only. Previously every services were deploying successfully. – Mr.DevEng Jan 18 '20 at 11:12
  • @Matt - I am used Weave . And every deployment showing the same status. This happend now only. Previously I was deploying successfully. – Mr.DevEng Jan 18 '20 at 11:13
  • post the log from the container running the weave-kube Image on that node where the pod is scheduled. Also kubelet logs of that node. – Arghya Sadhu Jan 18 '20 at 11:17
  • add output of curl 'http://127.0.0.1:6784/status on that node and kubelet logs on that node – Arghya Sadhu Jan 19 '20 at 04:21
  • Did you enable sysctl net.bridge.bridge-nf-call-iptables=1? Could you provide informations with : 1.`curl 'http://127.0.0.1:6784/status'` 2.`kubectl exec -n kube-system weave-net-fj9mm -c weave -- /home/weave/weave --local status ipam` 3. Your kubernetes pods and weave-net pods are up and running? – Jakub Jan 20 '20 at 12:03
  • waaay more info needed, including, but not limited to: how did you create the cluster? with `kubeadm`? if so please provide the **exact** command line that you used. How did you install network plugin? please provide the **exact** steps (`kubectl apply` and any other steps). Please also provide the output of `kubectl get pods --all-namespaces`. Last but not least, please provide detailed info about the network topology by which (virtual) machines hosting your nodes are connected (subnet cidrs, routes, firewall rules etc). of course please provide all this info in the question, not in comments. – morgwai Feb 10 '20 at 16:44
  • It appears like pod networking is not working. share the steps how the cluster is setup. we will review the steps and would be able to help you. – P Ekambaram Feb 12 '20 at 06:17

1 Answers1

1

The url that is appearing in the pod describe command, if you curl it. You will get something like this.

# curl 'http://127.0.0.1:6784/status'
        Version: 1.8.2 (version 1.9.1 available - please upgrade!)

        Service: router
       Protocol: weave 1..2
           Name: 66:2b:6a:ca:34:88(ip-10-128-152-185)
     Encryption: disabled
  PeerDiscovery: enabled
        Targets: 4
    Connections: 4 (3 established, 1 failed)
          Peers: 4 (with 12 established connections)
 TrustedSubnets: none

        Service: ipam
         Status: waiting for IP range grant from peers
          Range: 10.32.0.0/12
  DefaultSubnet: 10.32.0.0/12

"waiting for IP range grant from peers" status indicates that Weave Net's IPAM believes that all the IP address space is owned by other nodes in the cluster, but actually none of those nodes are able to be contacted at the moment.

Here's the workaround. Big red warnings:

  • All unreachable hosts were first identified as gone forever.
  • Do not run this on more than one node.
  • This may screw up your kubernetes cluster if something goes wrong.
  • There is a failsafe 'echo' added to the command in case you didn't read the above warnings.
% for i in $(curl -s 'http://127.0.0.1:6784/status/ipam' | grep 'unreachable\!$' | sort -k2 -n -r | awk -F'(' '{print $2}' | sed 's/).*//'); do echo curl -X DELETE 127.0.0.1:6784/peer/$i; done
65536 IPs taken over from ip-10-128-184-15
32768 IPs taken over from ip-10-128-159-154
32768 IPs taken over from ip-10-128-170-84

Reference - https://github.com/weaveworks/weave/issues/2822

Devesh mehta
  • 1,505
  • 8
  • 22