1

How to delete the failed jobs in the kubernetes cluster using a cron job in gke?. when i tried to delete the failed jobs using following YAML, it has deleted all the jobs (including running)


apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: XXX
namespace: XXX
spec:
schedule: "*/30 * * * *"
failedJobsHistoryLimit: 1
successfulJobsHistoryLimit: 1
jobTemplate:
 spec:
   template:
     spec:
       serviceAccountName: XXX
       containers:
       - name: kubectl-runner
         image: bitnami/kubectl:latest
         command: ["sh", "-c", "kubectl delete jobs $(kubectl get jobs | awk '$2 ~ 1/1' | awk '{print $1}')"]
       restartPolicy: OnFailure
Kunfu Panda
  • 57
  • 1
  • 2
  • 8
  • 1
    How this job getting triggered? if cron then you can set .spec.failedJobsHistoryLimit. and If this are normal one can't you just check COMPLETIONS field – yogesh kunjir Nov 06 '20 at 06:30
  • I agree what was said by user yogesh kunjir. You should be able to set a limit for your failed jobs in the CronJob. You can also look on this stackoverflow answer: https://stackoverflow.com/questions/53539576/kubectl-list-delete-all-completed-jobs . You will need to modify it it support "Failed" Jobs. – Dawid Kruk Nov 06 '20 at 10:39
  • @yogeshkunjir actually the above yaml is a cron-job, which is trying to delete failed normal jobs. And i have already tried the following , but it's not deleting the jobs. kubectl delete job $(kubectl get job -o=jsonpath='{.items[?(@.status.Failed==1)].metadata.name}') – Kunfu Panda Nov 07 '20 at 22:05

3 Answers3

2

This one visually looks better for me:

kubectl delete job --field-selector=status.phase==Failed
Vasili Angapov
  • 8,061
  • 15
  • 31
1

To delete failed Jobs in GKE you will need to use following command:

  • $ kubectl delete job $(kubectl get job -o=jsonpath='{.items[?(@.status.failed==1)].metadata.name}')

This command will output the JSON for all jobs and search for jobs that have status.failed field set to 1. It will then pass the failed jobs to $ kubectl delete jobs


This command ran in a CronJob will fail when there are no jobs with status: failed.

As a workaround you can use:

command: ["sh", "-c", "kubectl delete job --ignore-not-found=true $(kubectl get job -o=jsonpath='{.items[?(@.status.failed==1)].metadata.name}'); exit 0"]

exit 0 was added to make sure that the Pod will leave with status code 0


As for part of the comments made under the question:

You will need to modify it it support "Failed" Jobs

I have already tried the following , but it's not deleting the jobs. kubectl delete job $(kubectl get job -o=jsonpath='{.items[?(@.status.Failed==1)].metadata.name}')

  • @.status.Failed==1 <-- incorrect as JSON is case sensitive
  • @.status.failed==1 <-- correct

If you were to run incorrect version of this command on following Pods (to show that they failed and aren't still running to completion):

NAME              READY   STATUS      RESTARTS   AGE
job-four-9w5h9    0/1     Error       0          5s
job-one-n9trm     0/1     Completed   0          6s
job-three-nhqb6   0/1     Error       0          5s
job-two-hkz8r     0/1     Error       0          6s

You should get the following error :

error: resource(s) were provided, but no name, label selector, or --all flag specified

Above error will also show when there was no jobs passed to $ kubectl delete job.

Running correct version of this command should delete all jobs that failed:

job.batch "job-four" deleted
job.batch "job-three" deleted
job.batch "job-two" deleted

I encourage you to check additional resources:

Dawid Kruk
  • 8,982
  • 2
  • 22
  • 45
  • 2
    If a job has failed twice and had succeeded the third time and if you have kept the `failedJobsHistoryLimit` to 2, then in the command you will need to set `status.failed==2` instead of `status.failed==1` – Baskar Lingam Ramachandran Jun 13 '22 at 12:40
1

@Dawid Kruk answer is excellent but working on a specific namespace only and not for all namespaces as I needed. in order to solve it, I've created a simple bash script that gets all failed jobs and delete them -

# Delete failed jobs
failedJobs=$(kubectl get job -A -o=jsonpath='{range .items[?(@.status.failed>=1)]}{.metadata.name}{"\t"}{.metadata.namespace}{"\n"}{end}')
echo "$failedJobs" | while read each
do
  array=($each)
  jobName=${array[0]}
  namespace=${array[1]}
  echo "Debug: job name: $jobName is deleted on namespace $namespace"
  kubectl delete job  $jobName -n $namespace
done
Amit Baranes
  • 7,398
  • 2
  • 31
  • 53