How do I fix a dial tcp 10.96.0.1:443: i/o timeout error for Operator pod installed via helm-rook?

Question

I pretty much added the repo with this command

helm repo add rook-stable https://charts.rook.io/stable

Then I ran the command

helm install --namespace rook-ceph-system <NAME> <CHART VERSION>

The operator is created at first but then turns into a crashloopbackoff error.

Below is the log.

kubectl logs  rook-ceph-operator-5bdc9cfcb9-qml5n
2020-02-26 17:42:38.863455 I | rookcmd: starting Rook v0.9.3 with arguments '/usr/local/bin/rook ceph operator'
2020-02-26 17:42:38.863570 I | rookcmd: flag values: --alsologtostderr=false, --help=false, --log-level=INFO, --log_backtrace_at=:0, --log_dir=, --logtostderr=true, --mon-healthcheck-interval=45s, --mon-out-timeout=5m0s, --stderrthreshold=2, --v=0, --vmodule=
2020-02-26 17:42:39.056154 I | cephcmd: starting operator
failed to get pod. Get https://10.96.0.1:443/api/v1/namespaces/default/pods/rook-ceph-operator-5bdc9cfcb9-qml5n: dial tcp 10.96.0.1:443: i/o timeout

Any idea on how to fix this?

`10.96.0.1:443` is the default address for the API server, meaning your installation isn't able to talk to the control plane. There's a few things that could cause this — could you share more about your setup? For example: have you verified your CNI provider (e.g., flannel, weave, calico, etc.) is installed and working? Are your `kube-proxy` pods healthy? What happens if you `kubectl exec` into another pod and try to access the API server? — Jesse Stuart, Feb 27 '20 at 09:54
@JesseStuart thanks for replying really appreciate it. I'm using calico and when I run kubectl get pods --all-namespaces, all the kube-system is running. In my .kube config my API server address is 192.168.50.10:6443. Can I change the API server from 10.96.0.1:443 to 192.168.50.10:6443? Also its worth mentioning I'm not using a cloud provider, I'm using k8s on bare metal. Cluster IP will not work. — CoderDude74, Feb 28 '20 at 16:21
any update regarding your problem ? because i'm exactly in the same situation. I don't know why it tries to call API server on 10.96.0.1:443 — Peter Dev, Mar 13 '20 at 11:09

score 2 · Answer 1 · answered Mar 17 '20 at 17:37

Had the same problem with almost the same setup. Kubernetes cluster deployed with 3 VM (via vagrant). Calico as pod network.

Things I corrected : declare 3 VM hostnames in each /etc/hosts

192.168.100.51  kube1   kube1
192.168.100.52  kube2   kube2
192.168.100.53  kube3   kube3

Change pod-network-cidr :

kubeadm init --apiserver-advertise-address=192.168.100.51 --apiserver-cert-extra-sans=192.168.100.51 --node-name kube1 --pod-network-cidr=10.10.0.0/16

Use same pod-cidr in calico :

- name: CALICO_IPV4POOL_CIDR
  value: "10.10.0.0/16"

Rook deployement :

git clone --single-branch --branch release-1.2 https://github.com/rook/rook.git
cd cluster/examples/kubernetes/ceph
kubectl create -f common.yaml
kubectl create -f operator.yaml
kubectl create -f cluster-test.yaml

Now Ceph cluster is up and running.

I had some issue. I refer and fix kubernetes cluster configuration. Then It works fine. I guess pod-network-cidr is root cause of this issue. But, I don't know reason why exactly ? The guide of calico cni pod-network-cidr is 192.168.0.0/16. It looks like collision with node ip range. But I can't sure I had met simular issue with flannel network cni(10.240.0.0/16). — user3373742, Mar 22 '21 at 07:06

score 2 · Answer 2 · answered Mar 29 '22 at 05:45

After hours of googling, this is how I resolved it. Its an issue with the default CIDR which is 10.244.0.0/16 during the flannel initialization. I'm using canal for CNI networking. I solved this issue by editing configmap canal-config from the dashboard or use this kubectl edit cm -n kube-system kube-flannel-cfg

net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "Backend": {
        "Type": "vxlan"
      }
    }

Use kubectl get nodes -o jsonpath='{.items[*].spec.podCIDR}' to get pod CIDR. #change from 10.244.0.0/16 to your own POD networks. then delete the canal pod. credits:Jun Chen

How do I fix a dial tcp 10.96.0.1:443: i/o timeout error for Operator pod installed via helm-rook?

2 Answers2