1

we are doing case study with control-m to monitor Kubernetes job. On successful completions of job, control -m is able to recognize the job completed. however when it fails, it never recognize the failure it shows job is still running,i suspect job never gets completed in kubernetes.

Here as job, pod status and kubernetes yaml file.

My question, is there way to kubernetes job complete with failure? or is it default behavior of kubernetes?

#  kubectl -n ns-dev get job
NAME                             COMPLETIONS   DURATION   AGE
job-pod-failure-policy-example   0/1           3m39s      3m39s
# kubectl -n ns-dev get pods
NAME                                   READY   STATUS      RESTARTS   AGE
job-pod-failure-policy-example-h86bp  0/1     Error       0          82s
Yaml file:
apiVersion: batch/v1
kind: Job
metadata:
  name: job-pod-failure-policy-example
spec:
  template:
    spec:
      restartPolicy: Never
      containers:
      - name: main
        image: docker.io/library/bash:5
        command: ["bash"]        # example command simulating a bug which triggers the FailJob action
        args:
        - -c
        - echo "Hello world!" && sleep 5 && exit 1
  backoffLimit: 0
  podFailurePolicy:
    rules:
    - action: Terminate
      onExitCodes:
        containerName: main
        operator: In
        values: [1]

I have gone through below link to help to set the backoff limit to zero which helped stop retriggering multiple times.

Kubernetes job keeps spinning up pods which end up with the 'Error' status

Harsh Manvar
  • 27,020
  • 6
  • 48
  • 102
Programmer007
  • 67
  • 1
  • 7

1 Answers1

3

My question, is there way to kubernetes job complete with failure? or is it default behavior of kubernetes?

You can manage it from the code mostly, if there is any error gracefully shutdown or pass the proper exit code.

Kubernetes have two status only Failed or Complete.

But you can update the and mark the job as complete also by hitting the api-server with path request

curl <Api server>/apis/batch/v1/namespaces/<namespacename>/jobs/<job name>/status -XPATCH  -H "Accept: application/json" -H "Content-Type: application/strategic-merge-patch+json" -d '{"status": {"succeeded": 1}}'
Harsh Manvar
  • 27,020
  • 6
  • 48
  • 102
  • its graceful exit with exit code 8 (which means some kind application error). pod is marked error status which is correct. ideally it if the pod is exit with error code. job also should have marked as failed. is my missing something in my understanding? . is there any reason why job is not marked as failed ? – Programmer007 Dec 07 '22 at 12:22
  • In the example yaml exit code is 1 to simulate application throwing exit 8. however pod is ending up with error. thats means pod is able recongize the application throwing the error. However job is not updated because i am missing something in jaml or job is doesnot gets completed until the pod is successful? – Programmer007 Dec 09 '22 at 05:05
  • you are right job try to retry until mentioned threshold reach of max try, job only get successful when pod complete the task without any error code. if application throwing error pod will mark it as error and wont change status to complete or success. – Harsh Manvar Dec 09 '22 at 05:21