My kubernetes pods keep crashing with "CrashLoopBackOff" but I can't find any log

Question

This is what I keep getting:

[root@centos-master ~]# kubectl get pods
NAME               READY     STATUS             RESTARTS   AGE
nfs-server-h6nw8   1/1       Running            0          1h
nfs-web-07rxz      0/1       CrashLoopBackOff   8          16m
nfs-web-fdr9h      0/1       CrashLoopBackOff   8          16m

Below is output from describe pods kubectl describe pods

Events:
  FirstSeen LastSeen    Count   From                SubobjectPath       Type        Reason      Message
  --------- --------    -----   ----                -------------       --------    ------      -------
  16m       16m     1   {default-scheduler }                    Normal      Scheduled   Successfully assigned nfs-web-fdr9h to centos-minion-2
  16m       16m     1   {kubelet centos-minion-2}   spec.containers{web}    Normal      Created     Created container with docker id 495fcbb06836
  16m       16m     1   {kubelet centos-minion-2}   spec.containers{web}    Normal      Started     Started container with docker id 495fcbb06836
  16m       16m     1   {kubelet centos-minion-2}   spec.containers{web}    Normal      Started     Started container with docker id d56f34ae4e8f
  16m       16m     1   {kubelet centos-minion-2}   spec.containers{web}    Normal      Created     Created container with docker id d56f34ae4e8f
  16m       16m     2   {kubelet centos-minion-2}               Warning     FailedSync  Error syncing pod, skipping: failed to "StartContainer" for "web" with CrashLoopBackOff: "Back-off 10s restarting failed container=web pod=nfs-web-fdr9h_default(461c937d-d870-11e6-98de-005056040cc2)"

I have two pods: nfs-web-07rxz, nfs-web-fdr9h, but if I do kubectl logs nfs-web-07rxz or with -p option I don't see any log in both pods.

[root@centos-master ~]# kubectl logs nfs-web-07rxz -p
[root@centos-master ~]# kubectl logs nfs-web-07rxz

This is my replicationController yaml file: replicationController yaml file

apiVersion: v1 kind: ReplicationController metadata:   name: nfs-web spec:   replicas: 2   selector:
    role: web-frontend   template:
    metadata:
      labels:
        role: web-frontend
    spec:
      containers:
      - name: web
        image: eso-cmbu-docker.artifactory.eng.vmware.com/demo-container:demo-version3.0
        ports:
          - name: web
            containerPort: 80
        securityContext:
          privileged: true

My Docker image was made from this simple docker file:

FROM ubuntu
RUN apt-get update
RUN apt-get install -y nginx
RUN apt-get install -y nfs-common

I am running my kubernetes cluster on CentOs-1611, kube version:

[root@centos-master ~]# kubectl version
Client Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.0", GitCommit:"86dc49aa137175378ac7fba7751c3d3e7f18e5fc", GitTreeState:"clean", BuildDate:"2016-12-15T16:57:18Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.0", GitCommit:"86dc49aa137175378ac7fba7751c3d3e7f18e5fc", GitTreeState:"clean", BuildDate:"2016-12-15T16:57:18Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}

If I run the docker image by docker run I was able to run the image without any issue, only through kubernetes I got the crash.

Can someone help me out, how can I debug without seeing any log?

Can you try adding a [command](https://kubernetes.io/docs/tasks/configure-pod-container/define-command-argument-container/#defining-a-command-and-arguments-when-you-create-a-pod) to the pod yaml? — Sukumar, Jan 12 '17 at 19:29
check the logs with `kubectl logs -f ` it could be the (server/ container) startup issue. — Vishrant, May 03 '19 at 18:42
You could also run `kubectl get events` to see what is causing the crush loop. — Margach Chris, Oct 28 '19 at 13:26

score 156 · Accepted Answer · answered Jan 18 '17 at 02:50

156

As @Sukumar commented, you need to have your Dockerfile have a Command to run or have your ReplicationController specify a command.

The pod is crashing because it starts up then immediately exits, thus Kubernetes restarts and the cycle continues.

answered Jan 18 '17 at 02:50

Steve Sloka

3,444
2
22
29

3

If we have proper Dockerfile added and still getting the error , what may be the reason ? I am getting the same error evenif I properly added the Command. And When I am testing the independant docker image without using kubernetes deployment , then I am getting the output. So it is not problem with Dockerfile. Its something related with deployment ?. Here I added the whole issue that I am facing , https://stackoverflow.com/questions/56001352/accessing-the-deployed-service-using-helm-chart-in-kubernetes-cluster?noredirect=1#comment98718941_56001352 . Can you please look on that? – Mr.DevEng May 08 '19 at 13:14
4

There is a really good blog that goes in depth on what a CrashLoopBackoff means and the various cases where this can happen: https://managedkube.com/kubernetes/pod/failure/crashloopbackoff/k8sbot/troubleshooting/2019/02/12/pod-failure-crashloopbackoff.html – gar Feb 07 '20 at 16:12

score 119 · Answer 2 · edited Mar 01 '22 at 14:47

119

#Show details of specific pod
kubectl  describe pod <pod name> -n <namespace-name>

# View logs for specific pod
kubectl  logs <pod name> -n <namespace-name>

edited Mar 01 '22 at 14:47

dd619

5,910
8
35
60

answered Jun 04 '18 at 08:54

user128364

4,533
3
20
12

90

Although this commands might (or might not solve) the problem, a good answer should always contain an explanation how the problem is solved. – BDL Jun 04 '18 at 13:57
4

The first command `kubectl -n describe pod ` is to describe your pod, which can be used to see any error in pod creation and running the pod like lack of resource, etc. And the second command `kubectl -n logs -p ` to see the logs of the application running in the pod. – iamabhishek May 20 '20 at 05:54

score 27 · Answer 3 · answered Nov 17 '18 at 23:47

If you have an application that takes slower to bootstrap, it could be related to the initial values of the readiness/liveness probes. I solved my problem by increasing the value of initialDelaySeconds to 120s as my SpringBoot application deals with a lot of initialization. The documentation does not mention the default 0 (https://kubernetes.io/docs/api-reference/v1.9/#probe-v1-core)

service:
  livenessProbe:
    httpGet:
      path: /health/local
      scheme: HTTP
      port: 8888
    initialDelaySeconds: 120
    periodSeconds: 5
    timeoutSeconds: 5
    failureThreshold: 10
  readinessProbe:
    httpGet:
      path: /admin/health
      scheme: HTTP
      port: 8642
    initialDelaySeconds: 150
    periodSeconds: 5
    timeoutSeconds: 5
    failureThreshold: 10

A very good explanation about those values is given by What is the default value of initialDelaySeconds.

The health or readiness check algorithm works like:

wait for initialDelaySeconds

perform check and wait timeoutSeconds for a timeout if the number of continued successes is greater than successThreshold return success

if the number of continued failures is greater than failureThreshold return failure otherwise wait periodSeconds and start a new check

In my case, my application can now bootstrap in a very clear way, so that I know I will not get periodic crashloopbackoff because sometimes it would be on the limit of those rates.

you saved me a lot of hours! Thank you. My probe time was 90s and it wouldn't even let the pod start. — Abhinav Pandey, Aug 12 '20 at 14:47
Lol mine's was 1s so it crashed immediately. Switched to 300 and it's running fine now! — takanuva15, Dec 29 '20 at 22:18

score 22 · Answer 4 · answered Nov 11 '19 at 15:43

22

My pod kept crashing and I was unable to find the cause. Luckily there is a space where kubernetes saves all the events that occurred before my pod crashed.
(#List Events sorted by timestamp)

To see these events run the command:

kubectl get events --sort-by=.metadata.creationTimestamp

make sure to add a --namespace mynamespace argument to the command if needed

The events shown in the output of the command showed my why my pod kept crashing.

answered Nov 11 '19 at 15:43

matyas

2,696
23
29

Thanks! This tip helped me detect there was a problem mounting the volume with secret. – Leif John Jan 15 '20 at 14:23
Also helped me to discover assigned managed identity on the pod was incorrect. – Jorn.Beyers Jun 15 '20 at 18:27

score 20 · Answer 5 · answered Jun 13 '17 at 17:23

20

I had the need to keep a pod running for subsequent kubectl exec calls and as the comments above pointed out my pod was getting killed by my k8s cluster because it had completed running all its tasks. I managed to keep my pod running by simply kicking the pod with a command that would not stop automatically as in:

kubectl run YOUR_POD_NAME -n YOUR_NAMESPACE --image SOME_PUBLIC_IMAGE:latest --command tailf /dev/null

answered Jun 13 '17 at 17:23

hmacias

241
3
7

10

`tailf` did not work for me but this did (on Alpine linux): `--command /usr/bin/tail -- -f /dev/null` – Jakub Holý Jan 15 '19 at 13:35
1

it's not pod name. it's deployment name.```kubectl run -n --image --command tailf /dev/null``` – Gabriel Wu Mar 22 '19 at 00:53

score 13 · Answer 6 · edited Mar 17 '20 at 15:36

From This page, the container dies after running everything correctly but crashes because all the commands ended. Either you make your services run on the foreground, or you create a keep alive script. By doing so, Kubernetes will show that your application is running. We have to note that in the Docker environment, this problem is not encountered. It is only Kubernetes that wants a running app.

Update (an example):

Here's how to avoid CrashLoopBackOff, when launching a Netshoot container:

kubectl run netshoot --image nicolaka/netshoot -- sleep infinity

score 7 · Answer 7 · answered May 30 '20 at 07:31

7

In your yaml file, add command and args lines:

...
containers:
      - name: api
        image: localhost:5000/image-name 
        command: [ "sleep" ]
        args: [ "infinity" ]
...

Works for me.

answered May 30 '20 at 07:31

Marcela Romero

71
1
2

1

what happens to command specified in the Dockerfile, does it still execute it? is this reliable or a hack? – muon Jul 16 '21 at 17:54

score 5 · Answer 8 · answered Aug 03 '20 at 07:34

I observed the same issue, and added the command and args block in yaml file. I am copying sample of my yaml file for reference

 apiVersion: v1
    kind: Pod
    metadata:
      labels:
        run: ubuntu
      name: ubuntu
      namespace: default
    spec:
      containers:
      - image: gcr.io/ow/hellokubernetes/ubuntu
        imagePullPolicy: Never
        name: ubuntu
        resources:
          requests:
            cpu: 100m
        command: ["/bin/sh"]
        args: ["-c", "while true; do echo hello; sleep 10;done"]
      dnsPolicy: ClusterFirst
      enableServiceLinks: true

arguably this should be done in a script run inside the container, not on the master — mirekphd, Nov 24 '21 at 07:19

Coffee and Code · Answer 9 · 2022-07-31T13:07:23.027

As mentioned in above posts, the container exits upon creation.

If you want to test this without using a yaml file, you can pass the sleep command to the kubectl create deployment statement. The double hyphen -- indicates a command, which is equivalent of command: in a Pod or Deployment yaml file.

The below command creates a deployment for debian with sleep 1234, so it doesn't exit immediately.

kubectl create deployment deb --image=debian:buster-slim -- "sh" "-c" "while true; do sleep 1234; done"

You then can create a service etc, or, to test the container, you can kubectl exec -it <pod-name> -- sh (or -- bash) into the container you just created to test it.

score 3 · Answer 10 · answered Apr 30 '20 at 15:44

3

I solved this problem I increased memory resource

  resources:
          limits:
            cpu: 1
            memory: 1Gi
          requests:
            cpu: 100m
        memory: 250Mi

answered Apr 30 '20 at 15:44

Yousra ADDALI

332
1
3
14

score 2 · Answer 11 · answered Jan 15 '19 at 12:40

In my case the problem was what Steve S. mentioned:

The pod is crashing because it starts up then immediately exits, thus Kubernetes restarts and the cycle continues.

Namely I had a Java application whose main threw an exception (and something overrode the default uncaught exception handler so that nothing was logged). The solution was to put the body of main into try { ... } catch and print out the exception. Thus I could find out what was wrong and fix it.

(Another cause could be something in the app calling System.exit; you could use a custom SecurityManager with an overridden checkExit to prevent (or log the caller of) exit; see https://stackoverflow.com/a/5401319/204205.)

score 2 · Answer 12 · answered Jan 25 '23 at 20:18

2

kubectl logs -f POD, will only produce logs from a running container. Suffix --previous to the command to get logs from a previous container. Used maily for debugging. Hope this helps.

answered Jan 25 '23 at 20:18

Ajit Surendran

709
7
4

score 1 · Answer 13 · answered Jan 17 '19 at 15:29

Whilst troubleshooting the same issue I found no logs when using kubeclt logs <pod_id>. Therefore I ssh:ed in to the node instance to try to run the container using plain docker. To my surprise this failed also.

When entering the container with:

docker exec -it faulty:latest /bin/sh

and poking around I found that it wasn't the latest version.

A faulty version of the docker image was already available on the instance.

When I removed the faulty:latest instance with:

docker rmi faulty:latest

everything started to work.

score 1 · Answer 14 · edited Jun 10 '20 at 03:21

1

I had same issue and now I finally resolved it. I am not using docker-compose file. I just added this line in my Docker file and it worked.

ENV CI=true

Reference: https://github.com/GoogleContainerTools/skaffold/issues/3882

edited Jun 10 '20 at 03:21

Ardent Coder

3,777
9
27
53

answered Jun 09 '20 at 22:30

Shailesh Baneshi

21
1

score 1 · Answer 15 · answered Jan 21 '21 at 17:08

1

In my case this error was specific to the hello-world docker image. I used the nginx image instead of the hello-world image and the error was resolved.

answered Jan 21 '21 at 17:08

Yash Verma

461
5
4

score 0 · Answer 16 · answered Jul 21 '20 at 06:23

Try rerunning the pod and running

 kubectl get pods --watch

to watch the status of the pod as it progresses.

In my case, I would only see the end result, 'CrashLoopBackOff,' but the docker container ran fine locally. So I watched the pods using the above command, and I saw the container briefly progress into an OOMKilled state, which meant to me that it required more memory.

score 0 · Answer 17 · answered Sep 08 '20 at 12:22

0

i solved this problem by removing space between quotes and command value inside of array ,this is happened because container exited after started and no executable command present which to be run inside of container.

['sh', '-c', 'echo Hello Kubernetes! && sleep 3600']

answered Sep 08 '20 at 12:22

arjun a

151
7

this is a time bomb – mirekphd Nov 24 '21 at 07:13

score 0 · Answer 18 · edited Oct 19 '20 at 09:01

I had similar issue but got solved when I corrected my zookeeper.yaml file which had the service name mismatch with file deployment's container names. It got resolved by making them same.

apiVersion: v1
kind: Service
metadata:
  name: zk1
  namespace: nbd-mlbpoc-lab
  labels:
    app: zk-1
spec:
  ports:
  - name: client
    port: 2181
    protocol: TCP
  - name: follower
    port: 2888
    protocol: TCP
  - name: leader
    port: 3888
    protocol: TCP
  selector:
    app: zk-1
---
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
  name: zk-deployment
  namespace: nbd-mlbpoc-lab
spec:
  template:
    metadata:
      labels:
        app: zk-1
    spec:
      containers:
      - name: zk1
        image: digitalwonderland/zookeeper
        ports:
        - containerPort: 2181
        env:
        - name: ZOOKEEPER_ID
          value: "1"
        - name: ZOOKEEPER_SERVER_1
          value: zk1

score 0 · Answer 19 · answered Feb 01 '21 at 15:32

0

In my case, the issue was a misconstrued list of command-line arguments. I was doing this in my deployment file:

...
args:
  - "--foo 10"
  - "--bar 100"

Instead of the correct approach:

...
args:
  - "--foo"
  - "10"
  - "--bar"
  - "100"

answered Feb 01 '21 at 15:32

Paul Razvan Berg

16,949
9
76
114

score 0 · Answer 20 · edited May 03 '21 at 12:40

0

I finally found the solution when I execute 'docker run xxx ' command ,and I got the error then.It is caused by incomplete-platform .

edited May 03 '21 at 12:40

sanjeevRm

1,541
2
13
24

answered May 01 '21 at 16:57

xiaomingZhang

1

score 0 · Answer 21 · answered Nov 16 '21 at 10:06

It seems there could be a lot of reasons why a Pod should be in crashloopbackoff state.

In my case, one of the container was terminating continuously due to the missing Environment value.

So, the best way to debug is to -

1. check Pod description output i.e. kubectl describe pod abcxxx
2. check the events generated related to the Pod i.e. kubectl get events| grep abcxxx
3. Check if End-points have been created for the Pod i.e. kubectl get ep
4. Check if dependent resources have been in-place e.g. CRDs or configmaps or any other resource that may be required.

My kubernetes pods keep crashing with "CrashLoopBackOff" but I can't find any log

21 Answers21

Linked

Related