49

When a Kubernetes pod goes into CrashLoopBackOff state, you will fix the underlying issue. How do you force it to be rescheduled?

user2732949
  • 1,193
  • 3
  • 9
  • 15

5 Answers5

29

For apply new configuration the new pod should be created (the old one will be removed).

  • If your pod was created automatically by Deployment or DaemonSet resource, this action will run automaticaly each time after you update resource's yaml. It is not going to happen if your resource have spec.updateStrategy.type=OnDelete.

  • If problem was connected with error inside docker image, that you solved, you should update pods manually, you can use rolling-update feature for this purpose, In case when new image have same tag, you can just remove broken pod. (see below)

  • In case of node failure, the pod will recreated on new node after few time, the old pod will be removed after full recovery of broken node. worth noting it is not going to happen if your pod was created by DaemonSet or StatefulSet.

Any way you can manual remove crashed pod:

kubectl delete pod <pod_name>

Or all pods with CrashLoopBackOff state:

kubectl delete pod `kubectl get pods | awk '$3 == "CrashLoopBackOff" {print $1}'`

If you have completely dead node you can add --grace-period=0 --force options for remove just information about this pod from kubernetes.

P.M
  • 2,880
  • 3
  • 43
  • 53
kvaps
  • 2,589
  • 1
  • 21
  • 20
  • 6
    delete pod would indeed delete current pod, but it will bring system again to desired state, meaning it will create another pod and if service in it is broken it will again show CrashLoopBackOff. Any tip on how to "undeploy" failing pod completely? – Ewoks May 10 '19 at 13:09
  • I had to use: ```kubectl delete pod `kubectl get pods --all-namespaces | awk '$4 == "CrashLoopBackOff" {print $2}'` -n ``` – kipusoep Dec 13 '19 at 13:40
  • The link on rolling updates was changed a couple of years ago. The right link is currently this: https://kubernetes.io/docs/tutorials/kubernetes-basics/update/update-intro/ – P.M Sep 28 '21 at 02:53
  • deleting pods is not an option for static pods created from manifests with kubelet startup (e.g. kube-controller-manager / kube-scheduler) or other pods that are not created by a controller.... – Gert van den Berg Mar 25 '22 at 10:17
10

Generally a fix requires you to change something about the configuration of the pod (the docker image, an environment variable, a command line flag, etc), in which case you should remove the old pod and start a new pod. If your pod is running under a replication controller (which it should be), then you can do a rolling update to the new version.

Robert Bailey
  • 17,866
  • 3
  • 50
  • 58
  • Interesting, we deploy "snapshots" where the version does not change. While the RC gets updated, the status is not cleared, but I'll try your idea. – user2732949 Feb 17 '16 at 16:58
  • Updating the RC is not enough, you also have to replace the existing pods, either by killing them or performing a rolling-update as suggested. – Antoine Cotten Feb 18 '16 at 21:02
  • 1
    how to find what's exactly failing? – holms Oct 02 '16 at 20:06
  • 2
    @holms - have you tried running `kubectl logs -f `? That will show you the standard output from the most recently exited run of your container. – Robert Bailey Oct 20 '16 at 06:36
  • The link above regarding rolling update is currently broken (404) so, here is the [new one](https://kubernetes.io/docs/tutorials/kubernetes-basics/update/update-intro/). However, in my case, when I ran `kubectl logs -f ` as @Robert Bailey mentioned above, I got an error message as it couldn't load the application due to the expected file not being present to start the application. I updated this configuration to refer to the correct file and it works as I expect. – Sylvester Loreto Jan 04 '19 at 12:23
  • Thanks for the updated link, I've updated the answer with the current link. – Robert Bailey Jan 05 '19 at 19:29
6

5 Years later, unfortunately, this scenario seems to still be the case.

@kvaps answer above suggested an alternative (rolling updates), that essentially updates(overwrites) instead of deleting a pod -- the current working link of rolling updates The alternative to being able to delete a pod, was NOT to create a pod but instead create a deployment, and delete the deployment that contains the pod, subject to deletion.

$ kubectl get deployments -A 
$ kubectl delete -n <NAMESPACE> deployment <DEPLOYMENT>

# When on minikube or using docker for development + testing
$ docker system prune -a

The first command displays all deployments, alongside their respective namespaces. This helped me reduce the error of deleting deployments that share the same name(name collision) but from two different namespaces.

The second command deletes a deployment that is exactly located underneath a namespace.

The last command helps when working in development mode. Essentially, removing all unused images, which is not required but helps clean up and save some disk-space.

Another great tip, is to try to understand the reasons why a Pod is failing. The problem may be relying completely somewhere else, and k8s does a good deal of documenting. For that one of the following may help:

$ kubectl logs -f <POD NAME>
$ kubectl get events 

Other reference here on StackOveflow: https://stackoverflow.com/a/55647634/132610

P.M
  • 2,880
  • 3
  • 43
  • 53
1

For anyone interested I wrote a simple helm chart and python script which watches the current namespace and deletes any pod that enters CrashLoopBackOff.

The chart is at https://github.com/timothyclarke/helm-charts/tree/master/charts/dr-abc.

This is a sticking plaster. Fixing the problem is always the best option. In my specific case getting the historic apps into K8s so the development teams have a common place to work and strangle the old applications with new ones is preferable to fixing all the bugs in the old apps. Having this in the namespace to keep the illusion of everything running buys that time.

Timothy c
  • 751
  • 7
  • 8
0

This command will delete all pods that are in any of (CrashLoopBackOff, Init:CrashLoopBackOff, etc.) states. You can use grep -i <keyword> to match different states and then delete the pods that match the state. In your case it should be:

kubectl get pod -n <namespace> --no-headers | grep -i crash | awk '{print $1}' | while read line; do; kubectl delete pod -n <namespace> $line; done