I have multiple pods on Kubernetes (v1.23.5) that are not being evicted and rescheduled in case of node failure.
According to Kubernetes documentation, this process must begin after 300s:
Kubernetes automatically adds a Toleration for
node.kubernetes.io/not-ready
andnode.kubernetes.io/unreachable
withtolerationSeconds=300
unless you, or a controller, set those tolerations explicitly. These automatically-added tolerations mean that Pods remain bound to Nodes for 5 minutes after detecting one of these problems.
Unfortunately, pods get stuck in terminating status and would not evict. However, in one test on a pod without any PVC attached, it evicted and started running on another node.
- I'm trying to understand how I can make other pods evict after the default 300s time.
- I don't know why it would not happen automatically, and I must drain the pod stuck in a terminating state to make it work properly.
Update
I have seen the kvaps/kube-fencing project. There seems to be a fencing procedure that runs in case of a node failure. I couldn't make it solve my problem, and I didn't. I don't know whether it is because of my lack of comprehension of this project, or it is solely used to handle the node in case of a failure and not the pods stuck in termination state and evicting those pods.