741

How to wait in a bash script for several subprocesses spawned from that script to finish, and then return exit code !=0 when any of the subprocesses ends with code !=0?

Simple script:

#!/bin/bash
for i in `seq 0 9`; do
  doCalculations $i &
done
wait

The above script will wait for all 10 spawned subprocesses, but it will always give exit status 0 (see help wait). How can I modify this script so it will discover exit statuses of spawned subprocesses and return exit code 1 when any of subprocesses ends with code !=0?

Is there any better solution for that than collecting PIDs of the subprocesses, wait for them in order and sum exit statuses?

Gabriel Staples
  • 36,492
  • 15
  • 194
  • 265
tkokoszka
  • 11,647
  • 10
  • 30
  • 25
  • 2
    This could be significantly improved to touch on `wait -n`, available in modern bash to return only when the first/next command completes. – Charles Duffy Dec 15 '17 at 00:29
  • if you are looking to test using Bash, try this: https://github.com/sstephenson/bats – Alexander Mills Dec 15 '17 at 00:56
  • 3
    Active development of BATS have moved to https://github.com/bats-core/bats-core – Potherca Jan 20 '18 at 19:22
  • 3
    @CharlesDuffy `wait -n` has one small problem: if there are no child jobs remaining (aka race condition), it returns a non-zero exit status (fail) which can be indistinguishable from a failed child process. – drevicko Jun 27 '18 at 09:23
  • @drevicko : wait -n solution here: https://stackoverflow.com/a/59723887/627042 – Erik Aronesty Jan 13 '20 at 21:21
  • I saw this in a script, maybe it's the right thing. very concise. `wait < <(jobs -p)` – fbas Apr 02 '21 at 19:59
  • See also: [Unix & Linux: Launch a background process and check when it ends](https://unix.stackexchange.com/q/76717/114401) – Gabriel Staples Jan 11 '22 at 16:20
  • See also: [Get exit code of a background process](https://stackoverflow.com/q/1570262/4561887) – Gabriel Staples Feb 16 '22 at 05:49
  • For anyone thinking `wait -n` is a good idea: besides @drevicko's comment, `wait -n` DOES NOT return the return status of the process it has waited for. Bash's `wait` has a `-p` option for that, but using `wait $PID` is the most portable solution! – webmaster777 Jul 13 '23 at 11:15

35 Answers35

723

wait also (optionally) takes the PID of the process to wait for, and with $! you get the PID of the last command launched in the background. Modify the loop to store the PID of each spawned sub-process into an array, and then loop again waiting on each PID.

# run processes and store pids in array
for i in $n_procs; do
    ./procs[${i}] &
    pids[${i}]=$!
done

# wait for all pids
for pid in ${pids[*]}; do
    wait $pid
done
nextloop
  • 166
  • 9
Luca Tettamanti
  • 10,314
  • 3
  • 29
  • 25
  • 1
    how can you loop on 'wait', when that makes the script block until that specific process has died? – Alnitak Dec 10 '08 at 14:17
  • 12
    Weel, since you are going to wait for all the processes it doesn't matter if e.g. you are waiting on the first one while the second has already finished (the 2nd will be picked at the next iteration anyway). It's the same approach that you'd use in C with wait(2). – Luca Tettamanti Dec 10 '08 at 14:41
  • 8
    Ah, I see - different interpretation :) I read the question as meaning "return exit code 1 _immediately_ when any of subprocesses exit". – Alnitak Dec 10 '08 at 14:51
  • 6
    one thing, though - doesn't this risk a race condition if you're specifying PIDs, that PID dies, and then another process is spawned with the same PID? – Alnitak Dec 10 '08 at 14:52
  • Hum, I interpreted the code in the question as a barrier. As you said, apparently there's no way to wait for "any" child... – Luca Tettamanti Dec 10 '08 at 15:02
  • 6
    About the race: with wait(2) the PID won't be reused until it has been waited upon (it's a zombie); with bash scripts the doc is not very clear, but it seems (I tried...) that the shell waits for the PID and stores the return value for later use - so the PID may be reused :| – Luca Tettamanti Dec 10 '08 at 15:13
  • 73
    PID may be reused indeed, but you cannot wait for a process that is not a child of the current process (wait fails in that case). – tkokoszka Dec 10 '08 at 15:27
  • 12
    You can also use %n to refer to the n:th backgrounded job, and %% to refer to the most recent one. – conny Aug 12 '10 at 11:13
  • 2
    FYI, I found out an elegant way to do what the answer says: `for i in $n_procs; do ./procs[${i}] & ; pids[${i}]=$!; done; wait ${pids[*]};` – synack Jan 19 '14 at 23:53
  • 4
    @Kits89 This does not work for me. According to `wait` man pages, wait with multiple PID's only returns the return value of the last process waited for. So you do need an extra loop and wait for each PID separately, as suggested in the answer. – Nils_M May 27 '14 at 14:38
  • 30
    @Nils_M: You're right, I'm sorry. So it would be something like: `for i in $n_procs; do ./procs[${i}] & ; pids[${i}]=$!; done; for pid in ${pids[*]}; do wait $pid; done;`, right? – synack May 27 '14 at 15:15
  • 1
    moving @synack comment to the answer – knarf Apr 23 '18 at 09:50
  • 3
    That is still buggy. If the PID is reused, the exit code of wait will be non zero because you can only wait on childs of the current process which will make it look like one of the commands failed even if they didn't. – nhooyr May 17 '18 at 16:29
  • 4
    Better to use `for pid in "${pids[@]}"` rather than `for pid in ${pids[*]}`. Sure, someone who sets `IFS` to a value that contains numerics is asking for trouble, but still, better to write code that works right even when people *are* asking for trouble. :) – Charles Duffy Sep 05 '19 at 22:54
  • 1
    As @Alnitak said, what about the case where I want to report a failure *immediately* when any one of the processes dies with a non-zero exit code? If I have 3 processes that are all long running, and the 3rd process fails immediately, with the method proposed in the answer I would not find out about the failure of the 3rd process until I have waited for the other 2 processes to finish, which could be hours later. – Ian Tait Dec 12 '19 at 17:53
  • 1
    Important to set `set -e` in case you want to fail your script if a process fail and return the exit code from the failing process. – Sergio Santiago Apr 07 '20 at 11:06
  • 1
    What type of variable is `n_procs`, and what does it contain? What type of variable is `procs`, and what does it contain? Can someone please update this answer to make it runnable by adding definitions for those variables? I don't understand how. – Gabriel Staples Jan 11 '22 at 16:44
  • I figured it out. For a full, runnable example based on this answer, including to see how `n_procs` and `procs` might actually be implemented and iterated over, [see my new answer here](https://stackoverflow.com/a/70670852/4561887). – Gabriel Staples Jan 11 '22 at 17:24
  • This answer also doesn't show how to read back the error codes from the subprocesses being waited on. I added that feature [to my answer as well](https://stackoverflow.com/a/70670852/4561887). That's a key part of the question. – Gabriel Staples Feb 17 '22 at 08:44
  • Note: Use the command `set -eEBm` or simply just `set -e`, [man](https://linuxcommand.org/lc3_man_pages/seth.html), will cause your script to fail with debugging info if wait returns a non-zero exit. It's perfect for debugging, but read the manual carefully for how you'd prefer to have exits behave. – Richard Tyler Miles May 15 '22 at 16:44
  • 2
    Why is an associtive array used, why not just pids+=( $! ) – djsmiley2kStaysInside Jan 17 '23 at 09:02
  • The [man page](https://linuxcommand.org/lc3_man_pages/waith.html) for wait says that it waits for all currently active child processes, so it seems like just `wait` by itself would do the same thing? (If a process finishes before you get to wait, then waiting for that process's PID would be a no-op anyway, right?) – Charles Wood Aug 31 '23 at 20:02
346

http://jeremy.zawodny.com/blog/archives/010717.html :

#!/bin/bash

FAIL=0

echo "starting"

./sleeper 2 0 &
./sleeper 2 1 &
./sleeper 3 0 &
./sleeper 2 0 &

for job in `jobs -p`
do
echo $job
    wait $job || let "FAIL+=1"
done

echo $FAIL

if [ "$FAIL" == "0" ];
then
echo "YAY!"
else
echo "FAIL! ($FAIL)"
fi
HoverHell
  • 4,739
  • 3
  • 21
  • 23
  • 146
    `jobs -p` is giving PIDs of subprocesses that are in execution state. It will skip a process if the process finishes before `jobs -p` is called. So if any of subprocess ends before `jobs -p`, that process's exit status will be lost. – tkokoszka Feb 08 '09 at 15:06
  • 19
    Wow, this answer is way better than the top rated one. :/ – e40 Mar 29 '12 at 00:03
  • 4
    @e40 and the answer below is probably even better. And even better would probably be to run each command with '(cmd; echo "$?" >> "$tmpfile"), use this wait, and then read file for the fails. Also annotate-output. … or just use this script when you don't care that much. – HoverHell Mar 29 '12 at 10:18
  • I'd like to add that this answer is better than accepted one – shurikk Dec 16 '16 at 22:07
  • @tkokoszka I am not sure if you're right about that - or maybe it chanced with bash versions, see my answer – Alexander Mills Feb 13 '17 at 10:40
  • http://stackoverflow.com/questions/356100/how-to-wait-in-bash-for-several-subprocesses-to-finish-and-return-exit-code-0/42202064#42202064 – Alexander Mills Feb 13 '17 at 10:42
  • 4
    @tkokoszka to be accurate `jobs -p` is not giving *PIDs* of subprocesses, but instead *GPIDs*. The waiting logic seems to work anyway, it always waits on the group if such group exists and pid if not, but it's good to be aware.. especially if one were to build upon this and incorporate something like sending messages to the subprocess in which case the syntax is different depending on whether you have PIDs or GPIDs.. i.e. `kill -- -$GPID` vs `kill $PID` – Timo Mar 01 '18 at 13:40
  • 1
    sounds so simple as in this answer, right? Wrong! If you put those `sleeper` things on a `for` or `while` loop, it becomes child shell. and the `jobs` or `wait` doesn't consider child shell's background jobs. so, that's why we should use the accepted answer, even though it looks complex. – Thamme Gowda Aug 04 '20 at 05:59
  • @ThammeGowda A `for` or `while` loop doesn't create a subshell. Anyway, the accepted answer doesn't work, when the background processes are started from a subshell, since the PIDs wouldn't be added to the parent shell's `pids` array. – user686249 Nov 20 '20 at 09:48
  • @ThammeGowda however is right about this solution not working with a `for` or `while` loop – dan Dec 18 '20 at 01:31
118

Here is simple example using wait.

Run some processes:

$ sleep 10 &
$ sleep 10 &
$ sleep 20 &
$ sleep 20 &

Then wait for them with wait command:

$ wait < <(jobs -p)

Or just wait (without arguments) for all.

This will wait for all jobs in the background are completed.

If the -n option is supplied, waits for the next job to terminate and returns its exit status.

See: help wait and help jobs for syntax.

However the downside is that this will return on only the status of the last ID, so you need to check the status for each subprocess and store it in the variable.

Or make your calculation function to create some file on failure (empty or with fail log), then check of that file if exists, e.g.

$ sleep 20 && true || tee fail &
$ sleep 20 && false || tee fail &
$ wait < <(jobs -p)
$ test -f fail && echo Calculation failed.
kenorb
  • 155,785
  • 88
  • 678
  • 743
  • 2
    For those new to bash, the two calculations in the example here are `sleep 20 && true` and `sleep 20 && false` -- ie: replace those with your function(s). To understand `&&` and `||`, run `man bash` and type '/' (search) then '^ *Lists' (a regex) then enter: man will scroll down to the description of `&&` and `||` – drevicko Jun 27 '18 at 09:07
  • 1
    you should probably check that the file 'fail' doesn't exist at the start (or delete it). Depending on the application, it might also be a good idea to add '2>&1' before the `||` to catch STDERR in fail as well. – drevicko Jun 27 '18 at 09:10
  • i like this one, any drawbacks? actually, only when i want to list all subprocess and take some actions, eg. send signal, that i will try to bookkeeping pids or iterate jobs. Wait for finish, just `wait` – xgwang May 21 '19 at 03:04
  • 3
    This will miss exit status of job that failed before jobs -p is called – Erik Aronesty Jan 13 '20 at 21:18
  • not sure why but the `wait < <(jobs -p)` line is giving me a syntax error – tnrich Mar 29 '21 at 16:44
  • The wait method is really suited for super-quick parallelization of a for loop that has less iterations than there is available CPU. It worked perfectly well for me, thanks for the tip – Delevoye Guillaume Sep 03 '21 at 10:38
63

How about simply:

#!/bin/bash

pids=""

for i in `seq 0 9`; do
   doCalculations $i &
   pids="$pids $!"
done

wait $pids

...code continued here ...

Update:

As pointed by multiple commenters, the above waits for all processes to be completed before continuing, but does not exit and fail if one of them fails, it can be made to do with the following modification suggested by @Bryan, @SamBrightman, and others:

#!/bin/bash

pids=""
RESULT=0


for i in `seq 0 9`; do
   doCalculations $i &
   pids="$pids $!"
done

for pid in $pids; do
    wait $pid || let "RESULT=1"
done

if [ "$RESULT" == "1" ];
    then
       exit 1
fi

...code continued here ...
patapouf_ai
  • 17,605
  • 13
  • 92
  • 132
  • 2
    According to wait man pages, wait with multiple PID's only returns the return value of the last process waited for. So you do need an extra loop and wait for each PID separately, as suggested in the accepted answer (in comments). – Vlad Frolov Jul 06 '15 at 19:17
  • 1
    Because it doesn't seem to be stated anywhere else on this page, I'll add that the loop would be `for pid in $pids; do wait $pid; done` – Bryan Jun 07 '16 at 13:36
  • @Bryan , you dont need that loop. wait $pids works just as well ;) – patapouf_ai Jun 08 '16 at 13:59
  • 1
    @bisounours_tronconneuse yes, you do. See `help wait` - with multiple IDs `wait` returns the exit code of the last one only, as @vlad-frolov said above. – Sam Brightman Sep 28 '16 at 09:28
  • @bisounours_tronconneuse if you call `wait` once per PID, there is only one PID – the last – so any individual failure will be propagated correctly. – Sam Brightman Sep 29 '16 at 08:59
  • 1
    Bryan, @SamBrightman Ok. I modified it with your recomendations. – patapouf_ai Sep 30 '16 at 14:40
  • 4
    I had an obvious concern with this solution: what if a given process exits before the corresponding `wait` is called? It turns out that this isn't a problem: if you `wait` on a process that's already exited, `wait` will immediately exit with the status of the already-exited process. (Thank you, `bash` authors!) – Daniel Griscom Mar 23 '18 at 15:32
  • 2
    This was exactly what I needed, handles failures in either sub-process perfectly and ensures that the main process finishes (either early if either sub-process failed, or going on to the `...code continued here...` if all sub-processes succeed) only once all sub-processes are completed. – zachelrath Dec 04 '19 at 16:35
54

If you have GNU Parallel installed you can do:

# If doCalculations is a function
export -f doCalculations
seq 0 9 | parallel doCalculations {}

GNU Parallel will give you exit code:

  • 0 - All jobs ran without error.

  • 1-253 - Some of the jobs failed. The exit status gives the number of failed jobs

  • 254 - More than 253 jobs failed.

  • 255 - Other error.

Watch the intro videos to learn more: http://pi.dk/1

Ole Tange
  • 31,768
  • 5
  • 86
  • 104
Ole Tange
  • 1,990
  • 16
  • 10
  • 1
    Thanks! But you forgot to mention the "confusion" issue which I subsequently fell into: http://unix.stackexchange.com/a/35953 – Brent Bradburn May 28 '13 at 21:24
  • 3
    This looks like a great tool, but I don't think the above works as-is in a Bash script where `doCalculations` is a function defined in that same script (although the OP wasn't clear about this requirement). When I try, `parallel` says `/bin/bash: doCalculations: command not found` (it says this 10 times for the `seq 0 9` example above). See [here](http://stackoverflow.com/questions/11003418/calling-functions-with-xargs-within-a-bash-script) for a workaround. – Brent Bradburn May 28 '13 at 22:26
  • 4
    Also of interest: `xargs` has some capability to launch jobs in parallel via the `-P` option. From [here](http://stackoverflow.com/questions/3321738/shell-scripting-using-xargs-to-execute-parallel-instances-of-a-shell-function): `export -f doCalculations ; seq 0 9 |xargs -P 0 -n 1 -I{} bash -c "doCalculations {}"`. Limitations of `xargs` are enumerated in the man page for `parallel`. – Brent Bradburn May 28 '13 at 22:45
  • 1
    And if `doCalculations` relies on any other script-internal environment variables (custom `PATH`, etc.), they probably need to be explicitly `export`ed before launching `parallel`. – Brent Bradburn Jun 04 '13 at 01:35
  • 5
    @nobar The confusion is due to some packagers messing things up for their users. If you install using `wget -O - pi.dk/3 | sh` you will get no confusions. If your packager has messed things up for you I encourage you to raise the issue with your packager. Variables and functions should be exported (export -f) for GNU Parallel to see them (see `man parallel`: http://www.gnu.org/software/parallel/man.html#aliases_and_functions_do_not_work) – Ole Tange Jul 07 '13 at 14:21
  • You can use also `-j N` argument to allow run up to N jobs in parallel. Otherwise it will be limited to one job per CPU core. This is good if the jobs use some sleep functions etc. – Martin Flaska Feb 21 '22 at 08:57
40

Here's what I've come up with so far. I would like to see how to interrupt the sleep command if a child terminates, so that one would not have to tune WAITALL_DELAY to one's usage.

waitall() { # PID...
  ## Wait for children to exit and indicate whether all exited with 0 status.
  local errors=0
  while :; do
    debug "Processes remaining: $*"
    for pid in "$@"; do
      shift
      if kill -0 "$pid" 2>/dev/null; then
        debug "$pid is still alive."
        set -- "$@" "$pid"
      elif wait "$pid"; then
        debug "$pid exited with zero exit status."
      else
        debug "$pid exited with non-zero exit status."
        ((++errors))
      fi
    done
    (("$#" > 0)) || break
    # TODO: how to interrupt this sleep when a child terminates?
    sleep ${WAITALL_DELAY:-1}
   done
  ((errors == 0))
}

debug() { echo "DEBUG: $*" >&2; }

pids=""
for t in 3 5 4; do 
  sleep "$t" &
  pids="$pids $!"
done
waitall $pids
Mark Edgar
  • 4,707
  • 2
  • 24
  • 18
  • One could possibly skip that WAITALL_DELAY or set it very low, as no processes are started inside the loop I don't think it is too expensive. – Marian Jun 17 '10 at 17:13
23

To parallelize this...

for i in $(whatever_list) ; do
   do_something $i
done

Translate it to this...

for i in $(whatever_list) ; do echo $i ; done | ## execute in parallel...
   (
   export -f do_something ## export functions (if needed)
   export PATH ## export any variables that are required
   xargs -I{} --max-procs 0 bash -c ' ## process in batches...
      {
      echo "processing {}" ## optional
      do_something {}
      }' 
   )
  • If an error occurs in one process, it won't interrupt the other processes, but it will result in a non-zero exit code from the sequence as a whole.
  • Exporting functions and variables may or may not be necessary, in any particular case.
  • You can set --max-procs based on how much parallelism you want (0 means "all at once").
  • GNU Parallel offers some additional features when used in place of xargs -- but it isn't always installed by default.
  • The for loop isn't strictly necessary in this example since echo $i is basically just regenerating the output of $(whatever_list). I just think the use of the for keyword makes it a little easier to see what is going on.
  • Bash string handling can be confusing -- I have found that using single quotes works best for wrapping non-trivial scripts.
  • You can easily interrupt the entire operation (using ^C or similar), unlike the the more direct approach to Bash parallelism.

Here's a simplified working example...

for i in {0..5} ; do echo $i ; done |xargs -I{} --max-procs 2 bash -c '
   {
   echo sleep {}
   sleep 2s
   }'
Community
  • 1
  • 1
Brent Bradburn
  • 51,587
  • 17
  • 154
  • 173
  • 1
    For `--max-procs`: [How to obtain the number of CPUs/cores in Linux from the command line?](https://stackoverflow.com/a/17089001/86967) – Brent Bradburn Feb 04 '18 at 00:24
17

This is something that I use:

#wait for jobs
for job in `jobs -p`; do wait ${job}; done
jplozier
  • 255
  • 3
  • 6
11

This is an expansion on the most-upvoted answer, by @Luca Tettamanti, to make a fully-runnable example.

That answer left me wondering:

What type of variable is n_procs, and what does it contain? What type of variable is procs, and what does it contain? Can someone please update this answer to make it runnable by adding definitions for those variables? I don't understand how.

...and also:

  • How do you get the return code from the subprocess when it has completed (which is the whole crux of this question)?

Anyway, I figured it out, so here is a fully-runnable example.

Notes:

  1. $! is how to obtain the PID (Process ID) of the last-executed sub-process.
  2. Running any command with the & after it, like cmd &, for example, causes it to run in the background as a parallel suprocess with the main process.
  3. myarray=() is how to create an array in bash.
  4. To learn a tiny bit more about the wait built-in command, see help wait. See also, and especially, the official Bash user manual on Job Control built-ins, such as wait and jobs, here: https://www.gnu.org/software/bash/manual/html_node/Job-Control-Builtins.html#index-wait.

Full, runnable program: wait for all processes to end

multi_process_program.sh (from my eRCaGuy_hello_world repo):

#!/usr/bin/env bash


# This is a special sleep function which returns the number of seconds slept as
# the "error code" or return code" so that we can easily see that we are in
# fact actually obtaining the return code of each process as it finishes.
my_sleep() {
    seconds_to_sleep="$1"
    sleep "$seconds_to_sleep"
    return "$seconds_to_sleep"
}

# Create an array of whatever commands you want to run as subprocesses
procs=()  # bash array
procs+=("my_sleep 5")
procs+=("my_sleep 2")
procs+=("my_sleep 3")
procs+=("my_sleep 4")

num_procs=${#procs[@]}  # number of processes
echo "num_procs = $num_procs"

# run commands as subprocesses and store pids in an array
pids=()  # bash array
for (( i=0; i<"$num_procs"; i++ )); do
    echo "cmd = ${procs[$i]}"
    ${procs[$i]} &  # run the cmd as a subprocess
    # store pid of last subprocess started; see:
    # https://unix.stackexchange.com/a/30371/114401
    pids+=("$!")
    echo "    pid = ${pids[$i]}"
done

# OPTION 1 (comment this option out if using Option 2 below): wait for all pids
for pid in "${pids[@]}"; do
    wait "$pid"
    return_code="$?"
    echo "PID = $pid; return_code = $return_code"
done
echo "All $num_procs processes have ended."

Change the file above to be executable by running chmod +x multi_process_program.sh, then run it like this:

time ./multi_process_program.sh 

Sample output. See how the output of the time command in the call shows it took 5.084sec to run. We were also able to successfully retrieve the return code from each subprocess.

eRCaGuy_hello_world/bash$ time ./multi_process_program.sh 
num_procs = 4
cmd = my_sleep 5
    pid = 21694
cmd = my_sleep 2
    pid = 21695
cmd = my_sleep 3
    pid = 21697
cmd = my_sleep 4
    pid = 21699
PID = 21694; return_code = 5
PID = 21695; return_code = 2
PID = 21697; return_code = 3
PID = 21699; return_code = 4
All 4 processes have ended.
PID 21694 is done; return_code = 5; 3 PIDs remaining.
PID 21695 is done; return_code = 2; 2 PIDs remaining.
PID 21697 is done; return_code = 3; 1 PIDs remaining.
PID 21699 is done; return_code = 4; 0 PIDs remaining.

real    0m5.084s
user    0m0.025s
sys 0m0.061s

Going further: determine live when each individual process ends

If you'd like to do some action as each process finishes, and you don't know when they will finish, you can poll in an infinite while loop to see when each process terminates, then do whatever action you want.

Simply comment out the "OPTION 1" block of code above, and replace it with this "OPTION 2" block instead:

# OR OPTION 2 (comment out Option 1 above if using Option 2): poll to detect
# when each process terminates, and print out when each process finishes!
while true; do
    for i in "${!pids[@]}"; do
        pid="${pids[$i]}"
        # echo "pid = $pid"  # debugging

        # See if PID is still running; see my answer here:
        # https://stackoverflow.com/a/71134379/4561887
        ps --pid "$pid" > /dev/null
        if [ "$?" -ne 0 ]; then
            # PID doesn't exist anymore, meaning it terminated

            # 1st, read its return code
            wait "$pid"
            return_code="$?"

            # 2nd, remove this PID from the `pids` array by `unset`ting the
            # element at this index; NB: due to how bash arrays work, this does
            # NOT actually remove this element from the array. Rather, it
            # removes its index from the `"${!pids[@]}"` list of indices,
            # adjusts the array count(`"${#pids[@]}"`) accordingly, and it sets
            # the value at this index to either a null value of some sort, or
            # an empty string (I'm not exactly sure).
            unset "pids[$i]"

            num_pids="${#pids[@]}"
            echo "PID $pid is done; return_code = $return_code;" \
                 "$num_pids PIDs remaining."
        fi
    done

    # exit the while loop if the `pids` array is empty
    if [ "${#pids[@]}" -eq 0 ]; then
        break
    fi

    # Do some small sleep here to keep your polling loop from sucking up
    # 100% of one of your CPUs unnecessarily. Sleeping allows other processes
    # to run during this time.
    sleep 0.1
done

Sample run and output of the full program with Option 1 commented out and Option 2 in-use:

eRCaGuy_hello_world/bash$ ./multi_process_program.sh 
num_procs = 4
cmd = my_sleep 5
    pid = 22275
cmd = my_sleep 2
    pid = 22276
cmd = my_sleep 3
    pid = 22277
cmd = my_sleep 4
    pid = 22280
PID 22276 is done; return_code = 2; 3 PIDs remaining.
PID 22277 is done; return_code = 3; 2 PIDs remaining.
PID 22280 is done; return_code = 4; 1 PIDs remaining.
PID 22275 is done; return_code = 5; 0 PIDs remaining.

Each of those PID XXXXX is done lines prints out live right after that process has terminated! Notice that even though the process for sleep 5 (PID 22275 in this case) was run first, it finished last, and we successfully detected each process right after it terminated. We also successfully detected each return code, just like in Option 1.

Other References:

  1. *****+ [VERY HELPFUL] Get exit code of a background process - this answer taught me the key principle that (emphasis added):

    wait <n> waits until the process with PID is complete (it will block until the process completes, so you might not want to call this until you are sure the process is done), and then returns the exit code of the completed process.

    In other words, it helped me know that even after the process is complete, you can still call wait on it to get its return code!

  2. How to check if a process id (PID) exists

    1. my answer
  3. Remove an element from a Bash array - note that elements in a bash array aren't actually deleted, they are just "unset". See my comments in the code above for what that means.

  4. How to use the command-line executable true to make an infinite while loop in bash: https://www.cyberciti.biz/faq/bash-infinite-loop/

Gabriel Staples
  • 36,492
  • 15
  • 194
  • 265
  • @GabrielStaples your example was fantastic thank you. I only have 1 remaining issue. My script uses `set -e` which kills the entire script once the first (non zero) `my_sleep` function returns. Usually this isn't a problem if the subprocess is part of an `if` statement (`set -e` ignores failures in ifs and a couple other situations) but I am having trouble figuring out how to work something like that into your example. Somewhere around `${procs[$i]} & pids+=("$!")` I need something that `set -e` ignores when `${procs[$i]}` fails (returns non-zero) – Rosey Feb 22 '22 at 03:08
  • @Rosey, can you turn off `set -e` for the script? Does it have to be on? Also, you can run `set +e` anywhere in the script to turn it off, and `set -e` again to turn it back on. Try wrapping the subprocess call cmd with those. – Gabriel Staples Feb 22 '22 at 04:03
  • @GabrielStaples Yeah I can _sort_ of do that. You can't just sandwich the command like this though: `set +e ${procs[$i]} & pids+=("$!") set -e` because the subprocesses are async. By the time one completes you have turned `set -e` back on. Right now I have the `set +e` above the "run commands as subprocesses" for loop and `set -e` is in the if that breaks the while loop. It works but it's over-scoped. Simple syntax errors outside the my_sleep function will be ignored + displayed in console. – Rosey Feb 22 '22 at 04:12
  • @Rosey, try asking a new question and posting a comment here with a link to it. If you do, I'll take a look and put some more effort into it. – Gabriel Staples Feb 23 '22 at 00:30
  • The command `ps --pid "$pid"` is not 100% safe because the process may already terminated with its pid reused by other newly created processes on the system. Given the short polling duration, it is really hard to happen, and I don't know if we can do anything to improve it because the shell environment is limited. By the way, my shell BusyBox `sleep` command must take the minimum of 1 second as argument (floating point is not supported). I hope nothing goes wrong during that 1 second. :( – Livy Sep 11 '22 at 12:23
  • @Livy, what shell are you in? In Bash you can sleep floating pont fractional seconds. – Gabriel Staples Sep 11 '22 at 17:00
  • @GabrielStaples I am on embedded devices and only BusyBox `ash` is available. It takes me an nearly an hour to port your Bash sample code back to `ash` and test, but it runs perfectly now. Of course `ash` code can run on `Bash` just fine. BusyBox `sleep` only supports integer, however. – Livy Sep 11 '22 at 17:09
  • @Livy, I just tested `time busybox sleep 0.3` on 2 separate boards, one running busybox v1.31.1, and one running v1.27.1 (see `busybox --help` for version info), and in both cases it slept a fractional 0.3 seconds. Cmd: `time busybox sleep 0.3`. Sample output: `real 0m 0.30s`. So, what do you mean it doesn't support floating point sleep values? What version do you have? – Gabriel Staples Sep 20 '22 at 20:55
  • @GabrielStaples I am using OpenWrt 22.03. The BusyBox version is almost always latest (1.35.0) but I don't know how they compiled their customized version of BusyBox. Anyway, the fractional number problem is minor, if it doesn't like fractional, I'll use whole number. – Livy Sep 22 '22 at 09:08
9

I see lots of good examples listed on here, wanted to throw mine in as well.

#! /bin/bash

items="1 2 3 4 5 6"
pids=""

for item in $items; do
    sleep $item &
    pids+="$! "
done

for pid in $pids; do
    wait $pid
    if [ $? -eq 0 ]; then
        echo "SUCCESS - Job $pid exited with a status of $?"
    else
        echo "FAILED - Job $pid exited with a status of $?"
    fi
done

I use something very similar to start/stop servers/services in parallel and check each exit status. Works great for me. Hope this helps someone out!

Jason Slobotski
  • 1,386
  • 14
  • 18
  • When I stop it with Ctrl+C I still see processes running in background. – karsten Jul 30 '18 at 07:17
  • 2
    @karsten - this is a different problem. Assuming you are using bash you can trap an exit condition (including Ctrl+C) and have the current and all child processes killed using `trap "kill 0" EXIT` – Phil Jul 10 '19 at 23:12
  • @Phil is correct. Since these are background processes, killing the parent process just leaves any child processes running. My example does not trap any signals, which can be added if necessary as Phil has stated. – Jason Slobotski Jul 11 '19 at 15:40
9

Here's my version that works for multiple pids, logs warnings if execution takes too long, and stops the subprocesses if execution takes longer than a given value.

[EDIT] I have uploaded my newer implementation of WaitForTaskCompletion, called ExecTasks at https://github.com/deajan/ofunctions. There's also a compat layer for WaitForTaskCompletion [/EDIT]

function WaitForTaskCompletion {
    local pids="${1}" # pids to wait for, separated by semi-colon
    local soft_max_time="${2}" # If execution takes longer than $soft_max_time seconds, will log a warning, unless $soft_max_time equals 0.
    local hard_max_time="${3}" # If execution takes longer than $hard_max_time seconds, will stop execution, unless $hard_max_time equals 0.
    local caller_name="${4}" # Who called this function
    local exit_on_error="${5:-false}" # Should the function exit program on subprocess errors       

    Logger "${FUNCNAME[0]} called by [$caller_name]."

    local soft_alert=0 # Does a soft alert need to be triggered, if yes, send an alert once 
    local log_ttime=0 # local time instance for comparaison

    local seconds_begin=$SECONDS # Seconds since the beginning of the script
    local exec_time=0 # Seconds since the beginning of this function

    local retval=0 # return value of monitored pid process
    local errorcount=0 # Number of pids that finished with errors

    local pidCount # number of given pids

    IFS=';' read -a pidsArray <<< "$pids"
    pidCount=${#pidsArray[@]}

    while [ ${#pidsArray[@]} -gt 0 ]; do
        newPidsArray=()
        for pid in "${pidsArray[@]}"; do
            if kill -0 $pid > /dev/null 2>&1; then
                newPidsArray+=($pid)
            else
                wait $pid
                result=$?
                if [ $result -ne 0 ]; then
                    errorcount=$((errorcount+1))
                    Logger "${FUNCNAME[0]} called by [$caller_name] finished monitoring [$pid] with exitcode [$result]."
                fi
            fi
        done

        ## Log a standby message every hour
        exec_time=$(($SECONDS - $seconds_begin))
        if [ $((($exec_time + 1) % 3600)) -eq 0 ]; then
            if [ $log_ttime -ne $exec_time ]; then
                log_ttime=$exec_time
                Logger "Current tasks still running with pids [${pidsArray[@]}]."
            fi
        fi

        if [ $exec_time -gt $soft_max_time ]; then
            if [ $soft_alert -eq 0 ] && [ $soft_max_time -ne 0 ]; then
                Logger "Max soft execution time exceeded for task [$caller_name] with pids [${pidsArray[@]}]."
                soft_alert=1
                SendAlert

            fi
            if [ $exec_time -gt $hard_max_time ] && [ $hard_max_time -ne 0 ]; then
                Logger "Max hard execution time exceeded for task [$caller_name] with pids [${pidsArray[@]}]. Stopping task execution."
                kill -SIGTERM $pid
                if [ $? == 0 ]; then
                    Logger "Task stopped successfully"
                else
                    errrorcount=$((errorcount+1))
                fi
            fi
        fi

        pidsArray=("${newPidsArray[@]}")
        sleep 1
    done

    Logger "${FUNCNAME[0]} ended for [$caller_name] using [$pidCount] subprocesses with [$errorcount] errors."
    if [ $exit_on_error == true ] && [ $errorcount -gt 0 ]; then
        Logger "Stopping execution."
        exit 1337
    else
        return $errorcount
    fi
}

# Just a plain stupid logging function to be replaced by yours
function Logger {
    local value="${1}"

    echo $value
}

Example, wait for all three processes to finish, log a warning if execution takes loger than 5 seconds, stop all processes if execution takes longer than 120 seconds. Don't exit program on failures.

function something {

    sleep 10 &
    pids="$!"
    sleep 12 &
    pids="$pids;$!"
    sleep 9 &
    pids="$pids;$!"

    WaitForTaskCompletion $pids 5 120 ${FUNCNAME[0]} false
}
# Launch the function
someting
    
Orsiris de Jong
  • 2,819
  • 1
  • 26
  • 48
8

I don't believe it's possible with Bash's builtin functionality.

You can get notification when a child exits:

#!/bin/sh
set -o monitor        # enable script job control
trap 'echo "child died"' CHLD

However there's no apparent way to get the child's exit status in the signal handler.

Getting that child status is usually the job of the wait family of functions in the lower level POSIX APIs. Unfortunately Bash's support for that is limited - you can wait for one specific child process (and get its exit status) or you can wait for all of them, and always get a 0 result.

What it appears impossible to do is the equivalent of waitpid(-1), which blocks until any child process returns.

Alnitak
  • 334,560
  • 70
  • 407
  • 495
8

The following code will wait for completion of all calculations and return exit status 1 if any of doCalculations fails.

#!/bin/bash
for i in $(seq 0 9); do
   (doCalculations $i >&2 & wait %1; echo $?) &
done | grep -qv 0 && exit 1
errr
  • 81
  • 1
  • 2
6

If you have bash 4.2 or later available the following might be useful to you. It uses associative arrays to store task names and their "code" as well as task names and their pids. I have also built in a simple rate-limiting method which might come handy if your tasks consume a lot of CPU or I/O time and you want to limit the number of concurrent tasks.

The script launches all tasks in the first loop and consumes the results in the second one.

This is a bit overkill for simple cases but it allows for pretty neat stuff. For example one can store error messages for each task in another associative array and print them after everything has settled down.

#! /bin/bash

main () {
    local -A pids=()
    local -A tasks=([task1]="echo 1"
                    [task2]="echo 2"
                    [task3]="echo 3"
                    [task4]="false"
                    [task5]="echo 5"
                    [task6]="false")
    local max_concurrent_tasks=2

    for key in "${!tasks[@]}"; do
        while [ $(jobs 2>&1 | grep -c Running) -ge "$max_concurrent_tasks" ]; do
            sleep 1 # gnu sleep allows floating point here...
        done
        ${tasks[$key]} &
        pids+=(["$key"]="$!")
    done

    errors=0
    for key in "${!tasks[@]}"; do
        pid=${pids[$key]}
        local cur_ret=0
        if [ -z "$pid" ]; then
            echo "No Job ID known for the $key process" # should never happen
            cur_ret=1
        else
            wait $pid
            cur_ret=$?
        fi
        if [ "$cur_ret" -ne 0 ]; then
            errors=$(($errors + 1))
            echo "$key (${tasks[$key]}) failed."
        fi
    done

    return $errors
}

main
stefanct
  • 2,503
  • 1
  • 28
  • 32
6
#!/bin/bash
set -m
for i in `seq 0 9`; do
  doCalculations $i &
done
while fg; do true; done
  • set -m allows you to use fg & bg in a script
  • fg, in addition to putting the last process in the foreground, has the same exit status as the process it foregrounds
  • while fg will stop looping when any fg exits with a non-zero exit status

unfortunately this won't handle the case when a process in the background exits with a non-zero exit status. (the loop won't terminate immediately. it will wait for the previous processes to complete.)

Jayen
  • 5,653
  • 2
  • 44
  • 65
6

I've had a go at this and combined all the best parts from the other examples here. This script will execute the checkpids function when any background process exits, and output the exit status without resorting to polling.

#!/bin/bash

set -o monitor

sleep 2 &
sleep 4 && exit 1 &
sleep 6 &

pids=`jobs -p`

checkpids() {
    for pid in $pids; do
        if kill -0 $pid 2>/dev/null; then
            echo $pid is still alive.
        elif wait $pid; then
            echo $pid exited with zero exit status.
        else
            echo $pid exited with non-zero exit status.
        fi
    done
    echo
}

trap checkpids CHLD

wait
michaelt
  • 81
  • 1
  • 2
6

Wait for all jobs and return the exit code of the last failing job. Unlike solutions above, this does not require pid saving, or modifying inner loops of scripts. Just bg away, and wait.

function wait_ex {
    # this waits for all jobs and returns the exit code of the last failing job
    ecode=0
    while true; do
        [ -z "$(jobs)" ] && break
        wait -n
        err="$?"
        [ "$err" != "0" ] && ecode="$err"
    done
    return $ecode
}

EDIT: Fixed the bug where this could be fooled by a script that ran a command that didn't exist.

Erik Aronesty
  • 11,620
  • 5
  • 64
  • 44
  • 2
    This will work and reliably give the first error code from your executed commands unless it happens to be "command not found" (code 127). – drevicko Jan 14 '20 at 04:37
  • The -n flag will wait for the next child to change status and return the code. I'm not sure what happens if two completes at almost exactly the same time? In any case, this should be sufficient for my use case, thanks! – Andreas Løve Selvik Mar 16 '21 at 02:49
5

Just store the results out of the shell, e.g. in a file.

#!/bin/bash
tmp=/tmp/results

: > $tmp  #clean the file

for i in `seq 0 9`; do
  (doCalculations $i; echo $i:$?>>$tmp)&
done      #iterate

wait      #wait until all ready

sort $tmp | grep -v ':0'  #... handle as required
estani
  • 24,254
  • 2
  • 93
  • 76
4

I've just been modifying a script to background and parallelise a process.

I did some experimenting (on Solaris with both bash and ksh) and discovered that 'wait' outputs the exit status if it's not zero , or a list of jobs that return non-zero exit when no PID argument is provided. E.g.

Bash:

$ sleep 20 && exit 1 &
$ sleep 10 && exit 2 &
$ wait
[1]-  Exit 2                  sleep 20 && exit 2
[2]+  Exit 1                  sleep 10 && exit 1

Ksh:

$ sleep 20 && exit 1 &
$ sleep 10 && exit 2 &
$ wait
[1]+  Done(2)                  sleep 20 && exit 2
[2]+  Done(1)                  sleep 10 && exit 1

This output is written to stderr, so a simple solution to the OPs example could be:

#!/bin/bash

trap "rm -f /tmp/x.$$" EXIT

for i in `seq 0 9`; do
  doCalculations $i &
done

wait 2> /tmp/x.$$
if [ `wc -l /tmp/x.$$` -gt 0 ] ; then
  exit 1
fi

While this:

wait 2> >(wc -l)

will also return a count but without the tmp file. This might also be used this way, for example:

wait 2> >(if [ `wc -l` -gt 0 ] ; then echo "ERROR"; fi)

But this isn't very much more useful than the tmp file IMO. I couldn't find a useful way to avoid the tmp file whilst also avoiding running the "wait" in a subshell, which wont work at all.

Tosh
  • 41
  • 1
  • 1
3

I needed this, but the target process wasn't a child of current shell, in which case wait $PID doesn't work. I did find the following alternative instead:

while [ -e /proc/$PID ]; do sleep 0.1 ; done

That relies on the presence of procfs, which may not be available (Mac doesn't provide it for example). So for portability, you could use this instead:

while ps -p $PID >/dev/null ; do sleep 0.1 ; done
troelskn
  • 115,121
  • 27
  • 131
  • 155
3

There are already a lot of answers here, but I am surprised no one seems to have suggested using arrays... So here's what I did - this might be useful to some in the future.

n=10 # run 10 jobs
c=0
PIDS=()

while true

    my_function_or_command &
    PID=$!
    echo "Launched job as PID=$PID"
    PIDS+=($PID)

    (( c+=1 ))

    # required to prevent any exit due to error
    # caused by additional commands run which you
    # may add when modifying this example
    true

do

    if (( c < n ))
    then
        continue
    else
        break
    fi
done 


# collect launched jobs

for pid in "${PIDS[@]}"
do
    wait $pid || echo "failed job PID=$pid"
done
FreelanceConsultant
  • 13,167
  • 27
  • 115
  • 225
3

This works, should be just as a good if not better than @HoverHell's answer!

#!/usr/bin/env bash

set -m # allow for job control
EXIT_CODE=0;  # exit code of overall script

function foo() {
     echo "CHLD exit code is $1"
     echo "CHLD pid is $2"
     echo $(jobs -l)

     for job in `jobs -p`; do
         echo "PID => ${job}"
         wait ${job} ||  echo "At least one test failed with exit code => $?" ; EXIT_CODE=1
     done
}

trap 'foo $? $$' CHLD

DIRN=$(dirname "$0");

commands=(
    "{ echo "foo" && exit 4; }"
    "{ echo "bar" && exit 3; }"
    "{ echo "baz" && exit 5; }"
)

clen=`expr "${#commands[@]}" - 1` # get length of commands - 1

for i in `seq 0 "$clen"`; do
    (echo "${commands[$i]}" | bash) &   # run the command via bash in subshell
    echo "$i ith command has been issued as a background job"
done

# wait for all to finish
wait;

echo "EXIT_CODE => $EXIT_CODE"
exit "$EXIT_CODE"

# end

and of course, I have immortalized this script, in an NPM project which allows you to run bash commands in parallel, useful for testing:

https://github.com/ORESoftware/generic-subshell

Alexander Mills
  • 90,741
  • 139
  • 482
  • 817
3

Exactly for this purpose I wrote a bash function called :for.

Note: :for not only preserves and returns the exit code of the failing function, but also terminates all parallel running instance. Which might not be needed in this case.

#!/usr/bin/env bash

# Wait for pids to terminate. If one pid exits with
# a non zero exit code, send the TERM signal to all
# processes and retain that exit code
#
# usage:
# :wait 123 32
function :wait(){
    local pids=("$@")
    [ ${#pids} -eq 0 ] && return $?

    trap 'kill -INT "${pids[@]}" &>/dev/null || true; trap - INT' INT
    trap 'kill -TERM "${pids[@]}" &>/dev/null || true; trap - RETURN TERM' RETURN TERM

    for pid in "${pids[@]}"; do
        wait "${pid}" || return $?
    done

    trap - INT RETURN TERM
}

# Run a function in parallel for each argument.
# Stop all instances if one exits with a non zero
# exit code
#
# usage:
# :for func 1 2 3
#
# env:
# FOR_PARALLEL: Max functions running in parallel
function :for(){
    local f="${1}" && shift

    local i=0
    local pids=()
    for arg in "$@"; do
        ( ${f} "${arg}" ) &
        pids+=("$!")
        if [ ! -z ${FOR_PARALLEL+x} ]; then
            (( i=(i+1)%${FOR_PARALLEL} ))
            if (( i==0 )) ;then
                :wait "${pids[@]}" || return $?
                pids=()
            fi
        fi
    done && [ ${#pids} -eq 0 ] || :wait "${pids[@]}" || return $?
}

usage

for.sh:

#!/usr/bin/env bash
set -e

# import :for from gist: https://gist.github.com/Enteee/c8c11d46a95568be4d331ba58a702b62#file-for
# if you don't like curl imports, source the actual file here.
source <(curl -Ls https://gist.githubusercontent.com/Enteee/c8c11d46a95568be4d331ba58a702b62/raw/)

msg="You should see this three times"

:(){
  i="${1}" && shift

  echo "${msg}"

  sleep 1
  if   [ "$i" == "1" ]; then sleep 1
  elif [ "$i" == "2" ]; then false
  elif [ "$i" == "3" ]; then
    sleep 3
    echo "You should never see this"
  fi
} && :for : 1 2 3 || exit $?

echo "You should never see this"
$ ./for.sh; echo $?
You should see this three times
You should see this three times
You should see this three times
1

References

Ente
  • 2,301
  • 1
  • 16
  • 34
2
set -e
fail () {
    touch .failure
}
expect () {
    wait
    if [ -f .failure ]; then
        rm -f .failure
        exit 1
    fi
}

sleep 2 || fail &
sleep 2 && false || fail &
sleep 2 || fail
expect

The set -e at top makes your script stop on failure.

expect will return 1 if any subjob failed.

Yajo
  • 5,808
  • 2
  • 30
  • 34
2

There can be a case where the process is complete before waiting for the process. If we trigger wait for a process that is already finished, it will trigger an error like pid is not a child of this shell. To avoid such cases, the following function can be used to find whether the process is complete or not:

isProcessComplete(){
PID=$1
while [ -e /proc/$PID ]
do
    echo "Process: $PID is still running"
    sleep 5
done
echo "Process $PID has finished"
}
2

I almost fell into the trap of using jobs -p to collect PIDs, which does not work if the child has already exited, as shown in the script below. The solution I picked was simply calling wait -n N times, where N is the number of children I have, which I happen to know deterministically.

#!/usr/bin/env bash

sleeper() {
    echo "Sleeper $1"
    sleep $2
    echo "Exiting $1"
    return $3
}

start_sleepers() {
    sleeper 1 1 0 &
    sleeper 2 2 $1 &
    sleeper 3 5 0 &
    sleeper 4 6 0 &
    sleep 4
}

echo "Using jobs"
start_sleepers 1

pids=( $(jobs -p) )

echo "PIDS: ${pids[*]}"

for pid in "${pids[@]}"; do
    wait "$pid"
    echo "Exit code $?"
done

echo "Clearing other children"
wait -n; echo "Exit code $?"
wait -n; echo "Exit code $?"

echo "Waiting for N processes"
start_sleepers 2

for ignored in $(seq 1 4); do
    wait -n
    echo "Exit code $?"
done

Output:

Using jobs
Sleeper 1
Sleeper 2
Sleeper 3
Sleeper 4
Exiting 1
Exiting 2
PIDS: 56496 56497
Exiting 3
Exit code 0
Exiting 4
Exit code 0
Clearing other children
Exit code 0
Exit code 1
Waiting for N processes
Sleeper 1
Sleeper 2
Sleeper 3
Sleeper 4
Exiting 1
Exiting 2
Exit code 0
Exit code 2
Exiting 3
Exit code 0
Exiting 4
Exit code 0
Daniel C. Sobral
  • 295,120
  • 86
  • 501
  • 681
2

Starting with Bash 5.1, there is a nice new way of waiting for and handling the results of multiple background jobs thanks to the introduction of wait -p:

#!/usr/bin/env bash

# Spawn background jobs
for ((i=0; i < 10; i++)); do
    secs=$((RANDOM % 10)); code=$((RANDOM % 256))
    (sleep ${secs}; exit ${code}) &
    echo "Started background job (pid: $!, sleep: ${secs}, code: ${code})"
done

# Wait for background jobs, print individual results, determine overall result
result=0
while true; do
    wait -n -p pid; code=$?
    [[ -z "${pid}" ]] && break
    echo "Background job ${pid} finished with code ${code}"
    (( ${code} != 0 )) && result=1
done

# Return overall result
exit ${result}
Fonic
  • 2,625
  • 23
  • 20
  • Link to the wait documentation: https://www.gnu.org/software/bash/manual/html_node/Job-Control-Builtins.html#index-wait – lnksz Dec 07 '21 at 20:20
  • Alternatively run `help wait` within an interactive Bash session to display information on `wait -p`. – Fonic Dec 08 '21 at 10:15
1

I used this recently (thanks to Alnitak):

#!/bin/bash
# activate child monitoring
set -o monitor

# locking subprocess
(while true; do sleep 0.001; done) &
pid=$!

# count, and kill when all done
c=0
function kill_on_count() {
    # you could kill on whatever criterion you wish for
    # I just counted to simulate bash's wait with no args
    [ $c -eq 9 ] && kill $pid
    c=$((c+1))
    echo -n '.' # async feedback (but you don't know which one)
}
trap "kill_on_count" CHLD

function save_status() {
    local i=$1;
    local rc=$2;
    # do whatever, and here you know which one stopped
    # but remember, you're called from a subshell
    # so vars have their values at fork time
}

# care must be taken not to spawn more than one child per loop
# e.g don't use `seq 0 9` here!
for i in {0..9}; do
    (doCalculations $i; save_status $i $?) &
done

# wait for locking subprocess to be killed
wait $pid
echo

From there one can easily extrapolate, and have a trigger (touch a file, send a signal) and change the counting criteria (count files touched, or whatever) to respond to that trigger. Or if you just want 'any' non zero rc, just kill the lock from save_status.

Lloeki
  • 6,573
  • 2
  • 33
  • 32
1

Trapping CHLD signal may not work because you can lose some signals if they arrived simultaneously.

#!/bin/bash

trap 'rm -f $tmpfile' EXIT

tmpfile=$(mktemp)

doCalculations() {
    echo start job $i...
    sleep $((RANDOM % 5)) 
    echo ...end job $i
    exit $((RANDOM % 10))
}

number_of_jobs=10

for i in $( seq 1 $number_of_jobs )
do
    ( trap "echo job$i : exit value : \$? >> $tmpfile" EXIT; doCalculations ) &
done

wait 

i=0
while read res; do
    echo "$res"
    let i++
done < "$tmpfile"

echo $i jobs done !!!
mug896
  • 1,777
  • 1
  • 19
  • 17
1

trap is your friend. You can trap on ERR in a lot of systems. You can trap EXIT, or on DEBUG to perform a piece of code after every command.

This in addition to all the standard signals.

edit

This was an accidental login on an the wrong account, so I hadn't seen the request for examples.

Try here, on my regular account.

Handle exceptions in bash scripts

1

solution to wait for several subprocesses and to exit when any one of them exits with non-zero status code is by using 'wait -n'

#!/bin/bash
wait_for_pids()
{
    for (( i = 1; i <= $#; i++ )) do
        wait -n $@
        status=$?
        echo "received status: "$status
        if [ $status -ne 0 ] && [ $status -ne 127 ]; then
            exit 1
        fi
    done
}

sleep_for_10()
{
    sleep 10
    exit 10
}

sleep_for_20()
{
    sleep 20
}

sleep_for_10 &
pid1=$!

sleep_for_20 &
pid2=$!

wait_for_pids $pid2 $pid1

status code '127' is for non-existing process which means the child might have exited.

vishnuitta
  • 11
  • 1
1

I really liked Luca's answer but needed it for zsh, so here it is for reference:

pids=()

# run processes and store pids in array
for i in $n_procs; do
    ./procs[${i}] &
    pids+=($!)
done

# wait for all pids
for pid in ${pids[*]}; do
    wait $pid
done```
pfrank
  • 2,090
  • 1
  • 19
  • 26
0

I think that the most straight forward way to run jobs in parallel and check for status is using temporary files. There are already a couple similar answers (e.g. Nietzche-jou and mug896).

#!/bin/bash
rm -f fail
for i in `seq 0 9`; do
  doCalculations $i || touch fail &
done
wait 
! [ -f fail ]

The above code is not thread safe. If you are concerned that the code above will be running at the same time as itself, it's better to use a more unique file name, like fail.$$. The last line is to fulfill the requirement: "return exit code 1 when any of subprocesses ends with code !=0?" I threw an extra requirement in there to clean up. It may have been clearer to write it like this:

#!/bin/bash
trap 'rm -f fail.$$' EXIT
for i in `seq 0 9`; do
  doCalculations $i || touch fail.$$ &
done
wait 
! [ -f fail.$$ ] 

Here is a similar snippet for gathering results from multiple jobs: I create a temporary directory, story the outputs of all the sub tasks in a separate file, and then dump them for review. This doesn't really match the question - I'm throwing it in as a bonus:

#!/bin/bash
trap 'rm -fr $WORK' EXIT

WORK=/tmp/$$.work
mkdir -p $WORK
cd $WORK

for i in `seq 0 9`; do
  doCalculations $i >$i.result &
done
wait 
grep $ *  # display the results with filenames and contents
Mark
  • 4,249
  • 1
  • 18
  • 27
0

I had a similar situation, but had all kinds of problems with loop subshells that made sure the other solutions here didn't work, so I had my loop write the script I would run, with wait on the end. Effectively:

#!/bin/bash
echo > tmpscript.sh
for i in `seq 0 9`; do
    echo "doCalculations $i &" >> tmpscript.sh
done
echo "wait" >> tmpscript.sh
chmod u+x tmpscript.sh
./tmpscript.sh

dumb, but simple and helped debug some things afterwards.

If I had time I would have had a deeper look at GNU parallel but it was difficult with my own "doCalculations" process.

Mark
  • 86
  • 3
-1

I'm thinking maybe run doCalculations; echo "$?" >>/tmp/acc in a subshell that is sent to the background, then the wait, then /tmp/acc would contain the exit statuses, one per line. I don't know about any consequences of the multiple processes appending to the accumulator file, though.

Here's a trial of this suggestion:

File: doCalcualtions

#!/bin/sh

random -e 20
sleep $?
random -e 10

File: try

#!/bin/sh

rm /tmp/acc

for i in $( seq 0 20 ) 
do
        ( ./doCalculations "$i"; echo "$?" >>/tmp/acc ) &
done

wait

cat /tmp/acc | fmt
rm /tmp/acc

Output of running ./try

5 1 9 6 8 1 2 0 9 6 5 9 6 0 0 4 9 5 5 9 8
S.S. Anne
  • 15,171
  • 8
  • 38
  • 76
Nietzche-jou
  • 14,415
  • 4
  • 34
  • 45
  • 1
    There's should be any issues with multiple appenders, though return values may be written out of order so you don't known which process returned what... – Luca Tettamanti Dec 10 '08 at 15:22
  • 1
    You could just send identification info with the statuses. At any rate, OP only wanted to know if *any* of the subprocesses returned with status ≠ 0 without regard to which ones specifically. – Nietzche-jou Dec 10 '08 at 15:29
  • 1
    to get 20 results instead of 21 do `for i in $( seq 1 20 )` – qkrijger Feb 27 '15 at 10:50