0

I have a Cronjob in Kubernetes that runs every 3 minutes. It seems to be running the job fine as shown below however the generated pod immediately deletes itself and I cannot look at any details as to why it gets deleted.

The cronjob skeleton is below,

apiVersion: batch/v1beta1
kind: CronJob
...
spec:
  schedule: "*/3 * * * *"
  successfulJobsHistoryLimit: 1
  failedJobsHistoryLimit: 3
  concurrencyPolicy: Forbid
  startingDeadlineSeconds: 120
  jobTemplate:
    spec:
      backoffLimit: 2
      template:
        spec:
        ...

This generates the cronjob as below,

NAME   SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
test   */3 * * * *   False     0        1m              51m

The job generated by this,

NAME              DESIRED   SUCCESSFUL   AGE
test-1552177080   1         0            8m
test-1552177260   1         0            5m
test-1552177440   1         0            2m

Looking at the details into one of these jobs i can see,

Name:           test-1552177440
Namespace:      storage
...
Events:
  Type     Reason                Age                    From            Message
  ----     ------                ----                   ----            -------
  Normal   SuccessfulCreate      2m57s                  job-controller  Created pod: test-1552177440-b5d6g
  Normal   SuccessfulDelete      2m40s                  job-controller  Deleted pod: test-1552177440-b5d6g
  Warning  BackoffLimitExceeded  2m40s (x2 over 2m40s)  job-controller  Job has reached the specified backoff limit

As you can see the pod is deleted immediately with SuccessfulDelete. Is there anyway to stop this from happening ? Ultimately, id like to look at any logs or any details as to why the pod doesnt start.

nixgadget
  • 6,983
  • 16
  • 70
  • 103
  • There will be pods created for your job and it will be in `completed` state. You can see the logs of the completed pod. Is this what you are looking for? – Dinesh Balasubramanian Mar 10 '19 at 06:09
  • Possible duplicate of [How to list Kubernetes recently deleted pods?](https://stackoverflow.com/questions/40636021/how-to-list-kubernetes-recently-deleted-pods) – metaphori Mar 10 '19 at 13:44
  • @metaphori not sure how thats related to this as this isnt anything to do with a deployment workload. its a pod created by a job. – nixgadget Mar 10 '19 at 14:38
  • Thanks @dinesh job has failed as you can see. the actually problem is there is no way to find the details of the failing pod in this case as the pod is deleted automatically. – nixgadget Mar 10 '19 at 14:40
  • This is strange indeed. What version of Kubernetes are you running? Any details on your cluster configuration? Notice that pods seem to be cancelled when they reach the `backoffLimit`. Anyway, I would expect the *job* to be cancelled, if any (and pods in turn). – metaphori Mar 10 '19 at 17:28
  • Its `1.11.6`. The cluster is running on GKE. Nothing special about it. On a side note though I did discover the container in the pod had few problems. I discovered this by running it as a standalone deployment workload since I could never figure out why the pod in the jobs was dying. – nixgadget Mar 10 '19 at 23:09
  • Have you tried to exclude `backoffLimit` from job template configuration and check the further behavior? – Nick_Kh Mar 12 '19 at 10:36
  • @mk_sta have tried that too. same behaviour. – nixgadget Mar 12 '19 at 21:02

1 Answers1

1

I have had the same problem.

ref: https://github.com/kubernetes/kubernetes/issues/78644#issuecomment-498165434

Once a job had failed, (this occurs when it's exceeded its active deadline seconds or backoff limit) any active pods are deleted to prevent them from running/crashlooping forever. Any pods that aren't active e.g they are in a Pod phase of Failed or Succeeded, should be left around.
If you want your pods to be left around after failure, changing the restart policy of your pods to Never should prevent them from being immediately cleaned up, however this does mean that a new pod will be created each time your pods fail until the backoff limit is reached.

Can you try to fix restartPolicy to Never?

apiVersion: batch/v1beta1
kind: CronJob
...
spec:
  schedule: "*/3 * * * *"
  ...
  jobTemplate:
    spec:
      ...
      template:
        spec:
          ...
          restartPolicy: Never # Point!
sangjun
  • 38
  • 1
  • 6