Vagrant, vm os: ubuntu/bionic64, swap disabled
Kubernetes version: 1.18.0
infrastructure: 1 haproxy node, 3 external etcd node and 3 kubernetes master node
Attempts: trying to setup ha rancher so I am setting up ha kubernetes cluster first using kubeadm by following the official doc
Expected behavior: all k8s components are up and be able to navigate to weave scope to see all nodes
Actual behavior: CoreDNS is still not ready even after installing CNI (Weave Net) so weave scope (the nice visualization ui) is not working unless networking is working properly (weave net and coredns).
# kubeadm config
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: stable
controlPlaneEndpoint: "172.16.0.30:6443"
etcd:
external:
caFile: /etc/rancher-certs/ca-chain.cert.pem
keyFile: /etc/rancher-certs/etcd.key.pem
certFile: /etc/rancher-certs/etcd.cert.pem
endpoints:
- https://172.16.0.20:2379
- https://172.16.0.21:2379
- https://172.16.0.22:2379
-------------------------------------------------------------------------------
# firewall
vagrant@rancher-0:~$ sudo ufw status
Status: active
To Action From
-- ------ ----
OpenSSH ALLOW Anywhere
Anywhere ALLOW 172.16.0.0/26
OpenSSH (v6) ALLOW Anywhere (v6)
-------------------------------------------------------------------------------
# no swap
vagrant@rancher-0:~$ free -h
total used free shared buff/cache available
Mem: 1.9G 928M 97M 1.4M 966M 1.1G
Swap: 0B 0B 0B
k8s diagnostic output:
vagrant@rancher-0:~$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
rancher-0 Ready master 14m v1.18.0 10.0.2.15 <none> Ubuntu 18.04.4 LTS 4.15.0-99-generic docker://19.3.12
rancher-1 Ready master 9m23s v1.18.0 10.0.2.15 <none> Ubuntu 18.04.4 LTS 4.15.0-99-generic docker://19.3.12
rancher-2 Ready master 4m26s v1.18.0 10.0.2.15 <none> Ubuntu 18.04.4 LTS 4.15.0-99-generic docker://19.3.12
vagrant@rancher-0:~$ kubectl get services --all-namespaces
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
cert-manager cert-manager ClusterIP 10.106.146.236 <none> 9402/TCP 17m
cert-manager cert-manager-webhook ClusterIP 10.102.162.87 <none> 443/TCP 17m
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 18m
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 18m
weave weave-scope-app NodePort 10.96.110.153 <none> 80:30276/TCP 17m
vagrant@rancher-0:~$ kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cert-manager cert-manager-bd9d585bd-x8qpb 0/1 Pending 0 16m <none> <none> <none> <none>
cert-manager cert-manager-cainjector-76c6657c55-d8fpj 0/1 Pending 0 16m <none> <none> <none> <none>
cert-manager cert-manager-webhook-64b9b4fdfd-sspjx 0/1 Pending 0 16m <none> <none> <none> <none>
kube-system coredns-66bff467f8-9z4f8 0/1 Running 0 10m 10.32.0.2 rancher-1 <none> <none>
kube-system coredns-66bff467f8-zkk99 0/1 Running 0 16m 10.32.0.2 rancher-0 <none> <none>
kube-system kube-apiserver-rancher-0 1/1 Running 0 16m 10.0.2.15 rancher-0 <none> <none>
kube-system kube-apiserver-rancher-1 1/1 Running 0 12m 10.0.2.15 rancher-1 <none> <none>
kube-system kube-apiserver-rancher-2 1/1 Running 0 7m23s 10.0.2.15 rancher-2 <none> <none>
kube-system kube-controller-manager-rancher-0 1/1 Running 0 16m 10.0.2.15 rancher-0 <none> <none>
kube-system kube-controller-manager-rancher-1 1/1 Running 0 12m 10.0.2.15 rancher-1 <none> <none>
kube-system kube-controller-manager-rancher-2 1/1 Running 0 7m24s 10.0.2.15 rancher-2 <none> <none>
kube-system kube-proxy-grts7 1/1 Running 0 12m 10.0.2.15 rancher-1 <none> <none>
kube-system kube-proxy-jv9lm 1/1 Running 0 16m 10.0.2.15 rancher-0 <none> <none>
kube-system kube-proxy-z2lrc 1/1 Running 0 7m25s 10.0.2.15 rancher-2 <none> <none>
kube-system kube-scheduler-rancher-0 1/1 Running 0 16m 10.0.2.15 rancher-0 <none> <none>
kube-system kube-scheduler-rancher-1 1/1 Running 0 12m 10.0.2.15 rancher-1 <none> <none>
kube-system kube-scheduler-rancher-2 1/1 Running 0 7m23s 10.0.2.15 rancher-2 <none> <none>
kube-system weave-net-nnvkd 2/2 Running 0 7m25s 10.0.2.15 rancher-2 <none> <none>
kube-system weave-net-pgxnq 2/2 Running 0 12m 10.0.2.15 rancher-1 <none> <none>
kube-system weave-net-q22bh 2/2 Running 0 16m 10.0.2.15 rancher-0 <none> <none>
weave weave-scope-agent-9gwj2 1/1 Running 0 16m 10.0.2.15 rancher-0 <none> <none>
weave weave-scope-agent-mznp7 1/1 Running 0 7m25s 10.0.2.15 rancher-2 <none> <none>
weave weave-scope-agent-v7jql 1/1 Running 0 12m 10.0.2.15 rancher-1 <none> <none>
weave weave-scope-app-bc7444d59-cjpd8 0/1 Pending 0 16m <none> <none> <none> <none>
weave weave-scope-cluster-agent-5c5dcc8cb-ln4hg 0/1 Pending 0 16m <none> <none> <none> <none>
vagrant@rancher-0:~$ kubectl describe node rancher-0
Name: rancher-0
Roles: master
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=rancher-0
kubernetes.io/os=linux
node-role.kubernetes.io/master=
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Tue, 28 Jul 2020 09:24:17 +0000
Taints: node-role.kubernetes.io/master:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: rancher-0
AcquireTime: <unset>
RenewTime: Tue, 28 Jul 2020 09:35:33 +0000
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Tue, 28 Jul 2020 09:24:47 +0000 Tue, 28 Jul 2020 09:24:47 +0000 WeaveIsUp Weave pod has set this
MemoryPressure False Tue, 28 Jul 2020 09:35:26 +0000 Tue, 28 Jul 2020 09:24:17 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Tue, 28 Jul 2020 09:35:26 +0000 Tue, 28 Jul 2020 09:24:17 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Tue, 28 Jul 2020 09:35:26 +0000 Tue, 28 Jul 2020 09:24:17 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Tue, 28 Jul 2020 09:35:26 +0000 Tue, 28 Jul 2020 09:24:52 +0000 KubeletReady kubelet is posting ready status. AppArmor enabled
Addresses:
InternalIP: 10.0.2.15
Hostname: rancher-0
Capacity:
cpu: 2
ephemeral-storage: 10098432Ki
hugepages-2Mi: 0
memory: 2040812Ki
pods: 110
Allocatable:
cpu: 2
ephemeral-storage: 9306714916
hugepages-2Mi: 0
memory: 1938412Ki
pods: 110
System Info:
Machine ID: 9b1bc8a8ef2c4e5b844624a36302d877
System UUID: A282600C-28F8-4D49-A9D3-6F05CA16865E
Boot ID: 77746bf5-7941-4e72-817e-24f149172158
Kernel Version: 4.15.0-99-generic
OS Image: Ubuntu 18.04.4 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://19.3.12
Kubelet Version: v1.18.0
Kube-Proxy Version: v1.18.0
Non-terminated Pods: (7 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system coredns-66bff467f8-zkk99 100m (5%) 0 (0%) 70Mi (3%) 170Mi (8%) 11m
kube-system kube-apiserver-rancher-0 250m (12%) 0 (0%) 0 (0%) 0 (0%) 11m
kube-system kube-controller-manager-rancher-0 200m (10%) 0 (0%) 0 (0%) 0 (0%) 11m
kube-system kube-proxy-jv9lm 0 (0%) 0 (0%) 0 (0%) 0 (0%) 11m
kube-system kube-scheduler-rancher-0 100m (5%) 0 (0%) 0 (0%) 0 (0%) 11m
kube-system weave-net-q22bh 20m (1%) 0 (0%) 0 (0%) 0 (0%) 11m
weave weave-scope-agent-9gwj2 100m (5%) 0 (0%) 100Mi (5%) 2000Mi (105%) 11m
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 770m (38%) 0 (0%)
memory 170Mi (8%) 2170Mi (114%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 11m kubelet, rancher-0 Starting kubelet.
Warning ImageGCFailed 11m kubelet, rancher-0 failed to get imageFs info: unable to find data in memory cache
Normal NodeHasSufficientMemory 11m (x3 over 11m) kubelet, rancher-0 Node rancher-0 status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 11m (x3 over 11m) kubelet, rancher-0 Node rancher-0 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 11m (x2 over 11m) kubelet, rancher-0 Node rancher-0 status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 11m kubelet, rancher-0 Updated Node Allocatable limit across pods
Normal Starting 11m kubelet, rancher-0 Starting kubelet.
Normal NodeHasSufficientMemory 11m kubelet, rancher-0 Node rancher-0 status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 11m kubelet, rancher-0 Node rancher-0 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 11m kubelet, rancher-0 Node rancher-0 status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 11m kubelet, rancher-0 Updated Node Allocatable limit across pods
Normal Starting 11m kube-proxy, rancher-0 Starting kube-proxy.
Normal NodeReady 10m kubelet, rancher-0 Node rancher-0 status is now: NodeReady
vagrant@rancher-0:~$ kubectl exec -n kube-system weave-net-nnvkd -c weave -- /home/weave/weave --local status
Version: 2.6.5 (failed to check latest version - see logs; next check at 2020/07/28 15:27:34)
Service: router
Protocol: weave 1..2
Name: 5a:40:7b:be:35:1d(rancher-2)
Encryption: disabled
PeerDiscovery: enabled
Targets: 0
Connections: 0
Peers: 1
TrustedSubnets: none
Service: ipam
Status: ready
Range: 10.32.0.0/12
DefaultSubnet: 10.32.0.0/12
vagrant@rancher-0:~$ kubectl logs weave-net-nnvkd -c weave -n kube-system
INFO: 2020/07/28 09:34:15.989759 Command line options: map[conn-limit:200 datapath:datapath db-prefix:/weavedb/weave-net docker-api: expect-npc:true host-root:/host http-addr:127.0.0.1:6784 ipalloc-init:consensus=0 ipalloc-range:10.32.0.0/12 metrics-addr:0.0.0.0:6782 name:5a:40:7b:be:35:1d nickname:rancher-2 no-dns:true port:6783]
INFO: 2020/07/28 09:34:15.989792 weave 2.6.5
INFO: 2020/07/28 09:34:16.178429 Bridge type is bridged_fastdp
INFO: 2020/07/28 09:34:16.178451 Communication between peers is unencrypted.
INFO: 2020/07/28 09:34:16.182442 Our name is 5a:40:7b:be:35:1d(rancher-2)
INFO: 2020/07/28 09:34:16.182499 Launch detected - using supplied peer list: []
INFO: 2020/07/28 09:34:16.196598 Checking for pre-existing addresses on weave bridge
INFO: 2020/07/28 09:34:16.204735 [allocator 5a:40:7b:be:35:1d] No valid persisted data
INFO: 2020/07/28 09:34:16.206236 [allocator 5a:40:7b:be:35:1d] Initialising via deferred consensus
INFO: 2020/07/28 09:34:16.206291 Sniffing traffic on datapath (via ODP)
INFO: 2020/07/28 09:34:16.210065 Listening for HTTP control messages on 127.0.0.1:6784
INFO: 2020/07/28 09:34:16.210471 Listening for metrics requests on 0.0.0.0:6782
INFO: 2020/07/28 09:34:16.275523 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=4.15.0-99-generic&flag_kubernetes-cluster-size=0&flag_kubernetes-cluster-uid=aca5a8cc-27ca-4e8f-9964-4cf3971497c6&flag_kubernetes-version=v1.18.6&os=linux&signature=7uMaGpuc3%2F8ZtHqGoHyCnJ5VfOJUmnL%2FD6UZSqWYxKA%3D&version=2.6.5: dial tcp: lookup checkpoint-api.weave.works on 10.96.0.10:53: write udp 10.0.2.15:43742->10.96.0.10:53: write: operation not permitted
INFO: 2020/07/28 09:34:17.052454 [kube-peers] Added myself to peer list &{[{96:cd:5b:7f:65:73 rancher-1} {5a:40:7b:be:35:1d rancher-2}]}
DEBU: 2020/07/28 09:34:17.065599 [kube-peers] Nodes that have disappeared: map[96:cd:5b:7f:65:73:{96:cd:5b:7f:65:73 rancher-1}]
DEBU: 2020/07/28 09:34:17.065836 [kube-peers] Preparing to remove disappeared peer 96:cd:5b:7f:65:73
DEBU: 2020/07/28 09:34:17.079511 [kube-peers] Noting I plan to remove 96:cd:5b:7f:65:73
DEBU: 2020/07/28 09:34:17.095598 weave DELETE to http://127.0.0.1:6784/peer/96:cd:5b:7f:65:73 with map[]
INFO: 2020/07/28 09:34:17.097095 [kube-peers] rmpeer of 96:cd:5b:7f:65:73: 0 IPs taken over from 96:cd:5b:7f:65:73
DEBU: 2020/07/28 09:34:17.644909 [kube-peers] Nodes that have disappeared: map[]
INFO: 2020/07/28 09:34:17.658557 Assuming quorum size of 1
10.32.0.1
DEBU: 2020/07/28 09:34:17.761697 registering for updates for node delete events
vagrant@rancher-0:~$ kubectl logs coredns-66bff467f8-9z4f8 -n kube-system
.:53
[INFO] plugin/reload: Running configuration MD5 = 4e235fcc3696966e76816bcd9034ebc7
CoreDNS-1.6.7
linux/amd64, go1.13.6, da7f65b
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
I0728 09:31:10.764496 1 trace.go:116] Trace[2019727887]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/reflector.go:105 (started: 2020-07-28 09:30:40.763691008 +0000 UTC m=+0.308910646) (total time: 30.000692218s):
Trace[2019727887]: [30.000692218s] [30.000692218s] END
E0728 09:31:10.764526 1 reflector.go:153] pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/reflector.go:105: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
I0728 09:31:10.764666 1 trace.go:116] Trace[1427131847]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/reflector.go:105 (started: 2020-07-28 09:30:40.761333538 +0000 UTC m=+0.306553222) (total time: 30.00331917s):
Trace[1427131847]: [30.00331917s] [30.00331917s] END
E0728 09:31:10.764673 1 reflector.go:153] pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/reflector.go:105: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
I0728 09:31:10.767435 1 trace.go:116] Trace[939984059]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/reflector.go:105 (started: 2020-07-28 09:30:40.762085835 +0000 UTC m=+0.307305485) (total time: 30.005326233s):
Trace[939984059]: [30.005326233s] [30.005326233s] END
E0728 09:31:10.767569 1 reflector.go:153] pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/reflector.go:105: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
...
vagrant@rancher-0:~$ kubectl describe pod coredns-66bff467f8-9z4f8 -n kube-system
Name: coredns-66bff467f8-9z4f8
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: rancher-1/10.0.2.15
Start Time: Tue, 28 Jul 2020 09:30:38 +0000
Labels: k8s-app=kube-dns
pod-template-hash=66bff467f8
Annotations: <none>
Status: Running
IP: 10.32.0.2
IPs:
IP: 10.32.0.2
Controlled By: ReplicaSet/coredns-66bff467f8
Containers:
coredns:
Container ID: docker://899cfd54a5281939dcb09eece96ff3024a3b4c444e982bda74b8334504a6a369
Image: k8s.gcr.io/coredns:1.6.7
Image ID: docker-pullable://k8s.gcr.io/coredns@sha256:2c8d61c46f484d881db43b34d13ca47a269336e576c81cf007ca740fa9ec0800
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
State: Running
Started: Tue, 28 Jul 2020 09:30:40 +0000
Ready: False
Restart Count: 0
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from coredns-token-znl2p (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
coredns-token-znl2p:
Type: Secret (a volume populated by a Secret)
SecretName: coredns-token-znl2p
Optional: false
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 28m default-scheduler Successfully assigned kube-system/coredns-66bff467f8-9z4f8 to rancher-1
Normal Pulled 28m kubelet, rancher-1 Container image "k8s.gcr.io/coredns:1.6.7" already present on machine
Normal Created 28m kubelet, rancher-1 Created container coredns
Normal Started 28m kubelet, rancher-1 Started container coredns
Warning Unhealthy 3m35s (x151 over 28m) kubelet, rancher-1 Readiness probe failed: HTTP probe failed with statuscode: 503
Edit 0:
The issue is solved, so the problem was that I configure ufw
rule to allow cidr of my vms network but does not allow from kubernetes(from docker containers), so I configure ufw
to allow certain ports documented from kubernetes website and ports documented from weave website so now the cluster is working as expected