17

I'd like to launch a Kubernetes job and give it a fixed deadline to finish. If the pod is still running when the deadline comes, I'd like the job to automatically be killed.

Does something like this exist? (At first I thought that the Job spec's activeDeadlineSeconds covered this use case, but now I see that activeDeadlineSeconds only places a limit on when a job is re-tried; it doesn't actively kill a slow/runaway job.)

Andrew
  • 4,289
  • 2
  • 28
  • 33
Bosh
  • 8,138
  • 11
  • 51
  • 77
  • How about leveraging liveness probe? You could create a probe that returns success for the time you need and after the deadline is reached it would return failure(1) and kill container. More info about liveness probe: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/ – Ottovsky Jul 06 '17 at 20:15
  • I think this is actually a very good feature request. Is it somewhere tracked in the Kubernetes Github? – Alex Jul 21 '20 at 09:52

3 Answers3

15

You can self-impose timeouts on the container's entrypoint command by using GNU timeout utility.

For example the following Job that computes first 4000 digits of pi will time out after 10 seconds:

apiVersion: batch/v1
kind: Job
metadata:
  name: pi
spec:
  template:
    metadata:
      name: pi
    spec:
      containers:
      - name: pi
        image: perl
        command: ["/usr/bin/timeout", "10", "perl", "-Mbignum=bpi", "-wle", "print bpi(4000)"]
      restartPolicy: Never

(Manifest adopted from https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#running-an-example-job)

You can play with the numbers and see it timeout or not. Typically computing 4000 digits of pi takes ~23 seconds on my workstation, so if you set it to 5 seconds it'll probably always fail and if you set it to 120 seconds it will always work.

ahmet alp balkan
  • 42,679
  • 38
  • 138
  • 214
  • Thanks! I like this a lot, though it forces the pod template to know about the default command of the image, rather than just *running* the image. That's a bit unfortunate, but it's definitely a workable solution. – Bosh Jul 12 '17 at 23:35
  • 1
    Just FYI you can always create variables for the arguments e.g. `$TIMEOUT`, and have its value coming from a ConfigMap mount, so you don't have to hardcode. This way you can modify it in the ConfigMap and the new jobs will use the new value. – ahmet alp balkan Jul 13 '17 at 05:36
  • That's a good point -- though the spec still needs to know the default command for the image. – Bosh Jul 16 '17 at 15:11
  • using `timeout` cli is a pretty good way to handle it, I've totally over-engineered it (https://blog.random.io/k8s-cronjob-with-execution-timeout/) – anapsix Apr 22 '21 at 18:18
  • 1
    I usually end the `command:` portion of the yaml with `bash -c` and then put the command you care about in the `args:` section. That way the `command` never changes, and it's easier to write a natural command line in `args` because it doesn't require any awkward quoting. – David Parks May 08 '21 at 04:31
13

From the way I understand the documentation of activeDeadlineSeconds section is that it refers to the active time of a Job and after this time the Job is considered Failed.

Official doc statement:

The activeDeadlineSeconds applies to the duration of the job, no matter how many Pods are created. Once a Job reaches activeDeadlineSeconds, all of its running Pods are terminated and the Job status will become type: Failed with reason: DeadlineExceeded

https://kubernetes.io/docs/concepts/workloads/controllers/job/#job-termination-and-cleanup

tmetodie
  • 559
  • 6
  • 6
2

You could instead add the activeDeadlineSeconds to the pod spec in the pod template defined as part of the job. This way the pods which are spawned by the job are limited with the timeout.

th0masb
  • 303
  • 1
  • 11
  • timeout is of linux utility whereas activeDeadlineSeconds is correct practise in this case. – Arvin May 04 '22 at 06:56
  • in my case adding `activeDeadlineSeconds` caused pods to be immediately removed so that there is no way to inspect logs of failed container :( – SleepWalker Jun 28 '22 at 14:08
  • Ok. my previous comment was wrong. I've added `activeDeadlineSeconds` into job definition instead of pod. In order to keep pod, you need to set `activeDeadlineSeconds` on pod. But in this case you will restrict time for a single pod run. By default job has `backoffLimit: 6` which means that job can take `7 * activeDeadlineSeconds` – SleepWalker Jun 29 '22 at 05:46