0

I want to wait for multiple jobs which can fail or succeed. I wrote a simple script based on an answer from Sebastian N. It's purpose is to wait for either success or fail of a job. The script works fine for one job (it can only fail or success obviously).

Now for the problem... I need to wait for multiple jobs identified by the same label. The script works fine when all jobs fail or all jobs succeed. But when some job fails, some succeeds the kubectl wait will time out.

For what I intend to do next it's not necessary to know which jobs failed or succeeded I just want to know when they end. Here is the "wait part" of the script I wrote (LABEL is the label by which the jobs I want to wait for are identified):

kubectl wait --for=condition=complete -l LABEL --timeout 14400s && exit 0 &
completion_pid=$!

kubectl wait --for=condition=failed -l LABEL --timeout 14400s && exit 1 &
failure_pid=$!

wait -n $completion_pid $failure_pid
exit_code=$?

if (( exit_code == 0 )); then
  echo "Job succeeded"
  pkill -P $failure_pid
else
  echo "Job failed"
  pkill -P $completion_pid
fi

If someone is curious why I kill the other kubectl wait command it's because of the timeout I set. When the job succeeds the process ends but the other one waits until the time out is reached. To stop running it on the background I simply kill it.

1 Answers1

0

I found a workaround that fits my purpose. I found out that kubectl logs with the --follow or -f flag pointed to /dev/null actually "waits" until all jobs are done.

Further explanation:

The --follow flag means that the logs are printed continuously - not looking at the finishing state of the job. In addition, pointing the logs to /dev/null doesn't leave any unwanted text. I needed to print the output of logs via Python so I added another kubectl logs at the end (which I think is not ideal but it serves the purpose). I use sleep because I assume there is some procedure after all jobs are completed - without it the logs are not printed. Finally I use --tail=-1 flag because my logs are expected to have large output.

Here is my updated script (this part replaces everything from the script specified in question):

#wait until all jobs are completed, doesn't matter if failed or succeeded
kubectl logs -f -l LABEL > /dev/null

#sleep for some time && print final logs
sleep 2 && kubectl logs -l LABEL --tail=-1