How do I stop a CronJob from recreating failed Jobs?

Question

When for whatever reasons I delete the pod running the Job that was started by a CronJob, I immediately see a new pod being created. It is only once I delete something like six times the backoffLimit number of pods, that new ones stop being created.

Of course, if I'm actively monitoring the process, I can delete the CronJob, but what if the Pod inside the job fails when I'm not looking? I would like it not to be recreated.

How can I stop the CronJob from persisting in creating new jobs (or pods?), and wait until the next scheduled time if the current job/pod failed? Is there something similar to Jobs' backoffLimit, but for CronJobs?

I don't believe you can control the cronjob in that way but you can just delete the job that spawns from the cronjob and it would stop spawning the pods (for that run) — Ho Man, Jun 05 '19 at 17:52
@HoMan sounds disappointing. I've clarified my intentions in an edit. — LLlAMnYP, Jun 05 '19 at 21:22
hmmm, if you're only concerned about job failing and not a deleted pod, backoffLimit should already achieve what you need to? — Ho Man, Jun 05 '19 at 21:29
@Homan The `backoffLimit` seems to control the number of times a pod will be retried, but not a Job (I'm not exactly sure myself)? The experiment I describe in the first paragraph suggests, that the backofflimit isn't being respected. — LLlAMnYP, Jun 05 '19 at 21:34

score 3 · Accepted Answer · answered Jun 06 '19 at 13:05

Set startingDeadlineSeconds to a large value or left unset (the default).

At the same time set .spec.concurrencyPolicy as Forbid and the CronJobs skips the new job run while previous created job is still running.

If startingDeadlineSeconds is set to a large value or left unset (the default) and if concurrencyPolicy is set to Forbid, the job will not be run if failed.

Concurrent policy field you can add to specification to defintion of your CronJob (.spec.concurrencyPolicy), but this is optional.

It specifies how to treat concurrent executions of a job that is created by this CronJob. The spec may specify only one of these three concurrency policies:

Allow (default) - The cron job allows concurrently running jobs
Forbid - The cron job does not allow concurrent runs; if it is time for a new job run and the previous job run hasn’t finished yet, the cron job skips the new job run
Replace - If it is time for a new job run and the previous job run hasn’t finished yet, the cron job replaces the currently running job run with a new job run

It is good to know that currency policy applies just to the jobs created by the same CronJob. If there are multiple CronJobs, their respective jobs are always allowed to run concurrently.

A CronJob is counted as missed if it has failed to be created at its scheduled time. For example, If concurrencyPolicy is set to Forbid and a CronJob was attempted to be scheduled when there was a previous schedule still running, then it would count as missed.

For every CronJob, the CronJob controller checks how many schedules it missed in the duration from its last scheduled time until now. If there are more than 100 missed schedules, then it does not start the job and logs the error

More information you can find here: CronJobs and AutomatedTask.

I hope it helps.

It really seems unintuitive, how `startingDeadlineSeconds` is meant to be used. Could you suggest a human-readable interpretation? — LLlAMnYP, Jun 06 '19 at 14:07
About field startingDeadlineSeconds you can read here: https://stackoverflow.com/questions/51065538/what-does-kubernetes-cronjobs-startingdeadlineseconds-exactly-mean — Malgorzata, Jun 07 '19 at 06:13
I did upvote and it is helpful info, but I'm still facing the problematic behavior in our cluster, that I describe in the OP (admittedly, it could be a problem specifically on my side, not sure here). Generally, in such situations I'm reluctant to tick the checkmark and prefer to leave the question in an unanswered state in hope that someone else might offer a more complete solution. — LLlAMnYP, Jun 18 '19 at 15:27

score 1 · Answer 2 · answered Jun 25 '21 at 23:29

CronJob creates a job by a "backoffLimit" with a default value (6) in your case, and restart policy by default is (Always)

Better to make backoffLimit > (0) and make restart policy = (Never) and increase startingDeadlineSeconds to be lower than or equal to your interval or you can customize it up on your request to control the run time of each CronJob run Additionally, you may stop "concurrencyPolicy" >> (Forbid)

How do I stop a CronJob from recreating failed Jobs?

2 Answers2