16

I have kubernetes jobs that takes variable amount of time to complete. Between 4 to 8 minutes. Is there any way i can know when a job have completed, rather than waiting for 8 minutes assuming worst case. I have a test case that does the following:

1) Submits the kubernetes job.
2) Waits for its completion.
3) Checks whether the job has had the expected affect.

Problem is that in my java test that submits the deployment job in the kubernetes, I am waiting for 8 minutes even if the job has taken less than that to complete, as i dont have a way to monitor the status of the job from the java test.

trial999
  • 1,646
  • 6
  • 21
  • 36

7 Answers7

14
$ kubectl wait --for=condition=complete --timeout=600s job/myjob
Vojtech Vitek - golang.cz
  • 25,275
  • 4
  • 34
  • 40
  • 3
    A job may fail and never complete... In that case your command will get stuck for several minutes (timeout=600s) instead of returning. – collimarco Nov 10 '21 at 16:26
7
<kube master>/apis/batch/v1/namespaces/default/jobs 

endpoint lists status of the jobs. I have parsed this json and retrieved the name of the latest running job that starts with "deploy...".

Then we can hit

<kube master>/apis/batch/v1/namespaces/default/jobs/<job name retrieved above>

And monitor the status field value which is as below when the job succeeds

"status": {
    "conditions": [
      {
        "type": "Complete",
        "status": "True",
        "lastProbeTime": "2016-09-22T13:59:03Z",
        "lastTransitionTime": "2016-09-22T13:59:03Z"
      }
    ],
    "startTime": "2016-09-22T13:56:42Z",
    "completionTime": "2016-09-22T13:59:03Z",
    "succeeded": 1
  }

So we keep polling this endpoint till it completes. Hope this helps someone.

trial999
  • 1,646
  • 6
  • 21
  • 36
3

You can use NewSharedInformer method to watch the jobs' statuses. Not sure how to write it in Java, here's the golang example to get your job list periodically:

type ClientImpl struct {
    clients *kubernetes.Clientset
}

type JobListFunc func() ([]batchv1.Job, error)

var (
    jobsSelector = labels.SelectorFromSet(labels.Set(map[string]string{"job_label": "my_label"})).String()
)


func (c *ClientImpl) NewJobSharedInformer(resyncPeriod time.Duration) JobListFunc {
    var once sync.Once
    var jobListFunc JobListFunc

    once.Do(
        func() {
            restClient := c.clients.BatchV1().RESTClient()
            optionsModifer := func(options *metav1.ListOptions) {
                options.LabelSelector = jobsSelector
            }
            watchList := cache.NewFilteredListWatchFromClient(restClient, "jobs", metav1.NamespaceAll, optionsModifer)
            informer := cache.NewSharedInformer(watchList, &batchv1.Job{}, resyncPeriod)

            go informer.Run(context.Background().Done())

            jobListFunc = JobListFunc(func() (jobs []batchv1.Job, err error) {
                for _, c := range informer.GetStore().List() {
                    jobs = append(jobs, *(c.(*batchv1.Job)))
                }
                return jobs, nil
            })
        })

    return jobListFunc
}

Then in your monitor you can check the status by ranging the job list:

func syncJobStatus() {
    jobs, err := jobListFunc()
    if err != nil {
        log.Errorf("Failed to list jobs: %v", err)
        return
    }

    // TODO: other code

    for _, job := range jobs {
        name := job.Name
        // check status...
    }
}
akavel
  • 4,789
  • 1
  • 35
  • 66
Feiyu Zhou
  • 4,344
  • 32
  • 36
2

I found that the JobStatus does not get updated while polling using job.getStatus() Even if the status changes while checking from the command prompt using kubectl.

To get around this, I reload the job handler:

client.extensions().jobs()
                   .inNamespace(myJob.getMetadata().getNamespace())
                   .withName(myJob.getMetadata().getName())
                   .get();

My loop to check the job status looks like this:

KubernetesClient client = new DefaultKubernetesClient(config);
Job myJob = client.extensions().jobs()
                  .load(new FileInputStream("/path/x.yaml"))
                  .create();
boolean jobActive = true;
while(jobActive){
    myJob = client.extensions().jobs()
            .inNamespace(myJob.getMetadata().getNamespace())
            .withName(myJob.getMetadata().getName())
            .get();
    JobStatus myJobStatus = myJob.getStatus();
    System.out.println("==================");
    System.out.println(myJobStatus.toString());
         
    if(myJob.getStatus().getActive()==null){
        jobActive = false;
    }
    else {
        System.out.println(myJob.getStatus().getActive());
        System.out.println("Sleeping for a minute before polling again!!");
        Thread.sleep(60000);
    }
}

System.out.println(myJob.getStatus().toString());

Hope this helps

metasim
  • 4,793
  • 3
  • 46
  • 70
rockeArm
  • 46
  • 7
1

You did not mention what is actually checking the job completion, but instead of waiting blindly and hope for the best you should keep polling the job status inside a loop until it becomes "Completed".

Antoine Cotten
  • 2,673
  • 18
  • 37
  • Apologies, i should have mentioned in the question that i wanted to monitor this from a Java test. Will edit the question. – trial999 Sep 16 '16 at 09:27
  • 1
    I don't know what client library you're using but the logic for the test should be the same as what I explained: poll the job status, check the job status in the Json response, retry until this status equals "Completed". – Antoine Cotten Sep 17 '16 at 09:48
  • You are right. I have taken your suggestion, have detailed my solution below. – trial999 Sep 23 '16 at 16:31
1

Since you said Java; you can use kubernetes java bindings from fabric8 to start the job and add a watcher:

KubernetesClient k = ...
k.extensions().jobs().load(yaml).watch (new Watcher <Job>() {
    
  @Override
  public void onClose (KubernetesClientException e) {}
      
  @Override
  public void eventReceived (Action a, Job j) {
    if(j.getStatus().getSucceeded()>0)
      System.out.println("At least one job attempt succeeded");
    if(j.getStatus().getFailed()>0)
      System.out.println("At least one job attempt failed");
  }
});
metasim
  • 4,793
  • 3
  • 46
  • 70
Lev Kuznetsov
  • 3,520
  • 5
  • 20
  • 33
0

I don't know what kind of tasks are you talking about but let's assume you are running some pods

you can do

watch 'kubectl get pods | grep <name of the pod>'

or

kubectl get pods -w

It will not be the full name of course as most of the time the pods get random names if you are running nginx replica or deployment your pods will end up with something like nginx-1696122428-ftjvy so you will want to do

watch 'kubectl get pods | grep nginx'

You can replace the pods with whatever job you are doing i.e (rc,svc,deployments....)

Ahmad Aabed
  • 449
  • 1
  • 4
  • 4
  • 1
    even better you can use labels: `watch 'kubectl get pods -l job=foobar'` or `kubectl get -w pods -l job=foobar` – Tim Hockin Aug 31 '16 at 21:56
  • This is great if i am connected to the box, however i am trying to get the status of the job from a java test.Apologies, i should have mentioned that in the question. Have edited it now. – trial999 Sep 16 '16 at 09:27
  • I am thinking about somehow ssh into the kube box and then watch the status of the job. But if anyone is aware of an endpoint that could be monitored to see the status of the job while its not yet complete , to know that would be great. – trial999 Sep 16 '16 at 13:03
  • Sorry for coming late, I was going to tell you about the api just as @trial999 did, Also you don't need to ssh to the kube box, just configure kubectl to connect to your master – Ahmad Aabed Sep 27 '16 at 13:25
  • 1
    kubernetes jobs are a special class of work, like a pod but a different scope. I believe the author was asking about jobs (https://kubernetes.io/docs/tasks/job/) – cgseller Aug 14 '18 at 14:04