helm cert-manager failed post-install: timed out waiting for the condition

Question

I am still learning Kubernetes clustering and I have got this problem:

I have kubernetes cluster with 3 nodes on OpenSuse 15.4 and attempting to install Rancher, but helm is failing to get and install cert-manager..

helm install cert-manager jetstack/cert-manager --namespace cert-manager --create-namespace --                   version v1.7.1 --set startupapicheck.timeout=10m --set webhook.timeoutSeconds=30 --debug

install.go:194: [debug] Original chart version: "v1.7.1"
install.go:211: [debug] CHART PATH: /home/<cut>/.cache/helm/repository/cert-manager-v1.7.1.tgz

client.go:133: [debug] creating 1 resource(s)
client.go:133: [debug] creating 39 resource(s)
client.go:477: [debug] Starting delete for "cert-manager-startupapicheck" ServiceAccount
client.go:481: [debug] Ignoring delete failure for "cert-manager-startupapicheck" /v1, Kind=ServiceAccount: serviceaccounts "cert-                   manager-startupapicheck" not found
client.go:133: [debug] creating 1 resource(s)
client.go:477: [debug] Starting delete for "cert-manager-startupapicheck:create-cert" Role
client.go:481: [debug] Ignoring delete failure for "cert-manager-startupapicheck:create-cert" rbac.authorization.k8s.io/v1, Kind=R                   ole: roles.rbac.authorization.k8s.io "cert-manager-startupapicheck:create-cert" not found
client.go:133: [debug] creating 1 resource(s)
client.go:477: [debug] Starting delete for "cert-manager-startupapicheck:create-cert" RoleBinding
client.go:481: [debug] Ignoring delete failure for "cert-manager-startupapicheck:create-cert" rbac.authorization.k8s.io/v1, Kind=R                   oleBinding: rolebindings.rbac.authorization.k8s.io "cert-manager-startupapicheck:create-cert" not found
client.go:133: [debug] creating 1 resource(s)
client.go:477: [debug] Starting delete for "cert-manager-startupapicheck" Job
client.go:481: [debug] Ignoring delete failure for "cert-manager-startupapicheck" batch/v1, Kind=Job: jobs.batch "cert-manager-sta                   rtupapicheck" not found
client.go:133: [debug] creating 1 resource(s)
client.go:703: [debug] Watching for changes to Job cert-manager-startupapicheck with timeout of 5m0s
client.go:731: [debug] Add/Modify event for cert-manager-startupapicheck: ADDED
client.go:770: [debug] cert-manager-startupapicheck: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
client.go:731: [debug] Add/Modify event for cert-manager-startupapicheck: MODIFIED
client.go:770: [debug] cert-manager-startupapicheck: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
Error: INSTALLATION FAILED: failed post-install: timed out waiting for the condition
helm.go:84: [debug] failed post-install: timed out waiting for the condition
INSTALLATION FAILED
main.newInstallCmd.func2
        /home/abuild/rpmbuild/BUILD/helm-3.11.1/cmd/helm/install.go:141
github.com/spf13/cobra.(*Command).execute
        /home/abuild/rpmbuild/BUILD/helm-3.11.1/vendor/github.com/spf13/cobra/command.go:916
github.com/spf13/cobra.(*Command).ExecuteC
        /home/abuild/rpmbuild/BUILD/helm-3.11.1/vendor/github.com/spf13/cobra/command.go:1044
github.com/spf13/cobra.(*Command).Execute
        /home/abuild/rpmbuild/BUILD/helm-3.11.1/vendor/github.com/spf13/cobra/command.go:968
main.main
        /home/abuild/rpmbuild/BUILD/helm-3.11.1/cmd/helm/helm.go:83
runtime.main
        /usr/lib64/go/1.18/src/runtime/proc.go:250
runtime.goexit
        /usr/lib64/go/1.18/src/runtime/asm_amd64.s:1571

Any help would be greatly appreciated.

I was trying to reinstall whole cluster, but no joy.

Please add to the core of the question: 1. ```kubectl -n cert-manager get pods```; 2.```kubectl -n cert-manager get events```. Also it would be important to find the Pod generated by the Job *cert-manager-startupapicheck* and understand with a describe/look at the logs. — glv, Mar 29 '23 at 12:16
Thank you @glv , here you have it: `kubectl -n cert-manager get pods NAME READY STATUS RESTARTS AGE cert-manager-6d6bb4f487-46tcj 0/1 ContainerCreating 0 153m cert-manager-cainjector-7d55bf8f78-smkd8 0/1 ContainerCreating 0 153m cert-manager-startupapicheck-wbz7b 0/1 ContainerCreating 0 153m cert-manager-webhook-5c888754d5-l65bj 0/1 ContainerCreating 0 153m` — MB001, Mar 29 '23 at 14:00
`kubectl -n cert-manager get events Warning FailedCreatePodSandBox pod/cert-manager-6d6bb4f487-46tcj (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_cert-manager-6d6bb4f487-46tcj_cert-manager_7ead309e-99f8-43e8-be20-b9baa4c8337e_0(0b6e71726ca63781d3b04b452fc0a2fef8dde9060f313916da40c35b41d5e581): error adding pod cert-manager_cert-manager-6d6bb4f487-46tcj to CNI network "cbr0": loadFlannelSubnetEnv failed: open /run/flannel/subnet.env: no such file or directory` — MB001, Mar 29 '23 at 14:03
`kubectl logs cert-manager-startupapicheck-wbz7b -n cert-manager Error from server (BadRequest): container "cert-manager" in pod "cert-manager-startupapicheck-wbz7b" is waiting to start: ContainerCreating` — MB001, Mar 29 '23 at 14:27

score 0 · Accepted Answer · answered Mar 29 '23 at 14:35

0

Based on the evidence found in the comments, it would appear that something went wrong while configuring the network plugin.

Is it possible that you installed Flannel before finishing the cluster setup? Maybe before the kubeadm init.

I suggest doing a clean installation of the CNI and your applications.

Here a problem very similar to yours: kubernetes installation and kube-dns: open /run/flannel/subnet.env: no such file or directory

answered Mar 29 '23 at 14:35

glv

994
1
1
15

Hey, Can I use Calico/Cilium instead or it have to be Flannel with cert-manager? Which method will be best for Rancher? – MB001 Mar 29 '23 at 16:30
There is a lot of docs for this topic.. For example: https://docs.tigera.io/calico/latest/getting-started/kubernetes/rancher – glv Mar 29 '23 at 16:32
Hmm.. same efect with Calico, helm is hunging for 5min then end. – MB001 Mar 29 '23 at 17:32
But have you try to re-install cluster from scratch? Maybe you missed some steps. – glv Mar 29 '23 at 20:14

score 0 · Answer 2 · answered Mar 30 '23 at 09:41

0

Thanks again! You were right, I had to reset whole cluster and CNI with proper --pod-network-cidr=10.244.0.0/16, then deploy flannel and cert-manager, without errors!

answered Mar 30 '23 at 09:41

MB001

23
6

helm cert-manager failed post-install: timed out waiting for the condition

2 Answers2