Make bash wait command fail on first error

Question

I have a bash script that parallelise some time-consuming commands and so far it runs perfectly. I am using wait command as follows:

docker pull */* &
docker pull */* &
docker pull */* &
docker pull */* &
docker pull */* &
docker pull */* &
docker pull */* &
composer install -n &
wait

Now I want this script to abort all commands and give exit code if one of the commands fail. How to achieve this?

Note: * / * are docker image names, not important for the context

Just a sidenote: Did you try to measure the time of your script and measure the time of this tasks not parallelized? Parallel is not always faster, especially when it comes to HDD access... — Nidhoegger, Sep 16 '15 at 08:44
this docker pull's are pulling large files from docker hub and yes if we parallelise it reduced dramatically the execution time. — Cagatay Gurturk, Sep 16 '15 at 08:51
You can see: http://stackoverflow.com/questions/356100/how-to-wait-in-bash-for-several-subprocesses-to-finish-and-return-exit-code-0 — Daniel, Jan 19 '16 at 12:46

score 5 · Answer 1 · answered Sep 16 '15 at 09:34

This will require bash for wait -n.

The trick here is to keep a list of the subprocesses you spawned, then wait for them individually (in the order they finish). You can then check the return code of the process that finished and kill the lot of them if it failed. For example:

#!/bin/bash

# Remember the pid after each command, keep a list of them.
# pidlist=foo could also go on a line of its own, but I
# find this more readable because I like tabular layouts.
sleep 10 & pidlist="$!"
sleep 10 & pidlist="$pidlist $!"
sleep 10 & pidlist="$pidlist $!"
sleep 10 & pidlist="$pidlist $!"
sleep 10 & pidlist="$pidlist $!"
sleep 10 & pidlist="$pidlist $!"
false    & pidlist="$pidlist $!"

echo $pidlist

# $pidlist intentionally unquoted so every pid in it expands to a
# parameter of its own. Note that $i doesn't have to be the PID of
# the process that finished, it's just important that the loop runs
# as many times as there are PIDs.
for i in $pidlist; do
    # Wait for a single process in the pid list to finish,
    # check if it failed,
    if ! wait -n $pidlist; then
        # If it did, kill the lot. pipe kill's stderr
        # to /dev/null so it doesn't complain about
        # already-finished processes not existing
        # anymore.
        kill $pidlist 2>/dev/null

        # then exit with a non-zero status.
        exit 1
    fi
done

Two notes: `wait -n` requires `bash` 4.3 or later, and you can use `jobs -p` instead of maintaining `pidlist` manually. — chepner, Sep 16 '15 at 11:09
@chepner Ah, thanks. I'll make a note of the version requirement. I don't think `jobs -p` is a good idea, though. If a subprocess exits with an error, it potentially exits quite quickly -- possibly before `jobs -p` is run. This means that erroring processes may, depending on a race condition, not end up in the list of processes to monitor, and their exit status may not be checked. — Wintermute, Sep 16 '15 at 12:06
Why is such a basic command so miserable at its job? I'm trying to make a simple build script, step A, B, then C -- these should run concurrently, and the whole thing should come crashing down at the first whiff of failure. Instead, now I have an npm build routine that *never* fails, and now I have to rewrite it in Gulp or something, as the above example works if I want to make an external build.sh script, but is definitely a miserable solution. I was expecting something like `wait --all` (I'd call the current behavior `wait --any`)... I can't believe the complexity here. — ChaseMoskal, Mar 02 '17 at 19:08

score -2 · Answer 2 · edited May 23 '17 at 11:44

-2

If the returned value of a command is 0, it indicates success, others indicate error. So you can create a function, and call it before each command. (This will only work if you remove &)

valid() {
    if "$@"; then 
        return
    else
        exit 1
    fi
}

valid docker pull */* 
valid docker pull */* 
valid docker pull */* 
valid docker pull */* 
valid docker pull */* 
valid docker pull */*
valid docker pull */* 
valid composer install -n 
wait

Another alternative would be to put

set -e

at the beginning of your script.

This will cause the shell to exit immediately if a simple command exits with a nonzero exit value.

Extra note:

If you have your own Docker Registry, you don't need to pull the images in parallel. Docker Registry 2.0, which works with Docker 1.7.0 and above, downloads the image(s) layers in parallel, which makes the procedure much faster, so you don't have to pull all your images simultaneously.

Source

edited May 23 '17 at 11:44

Community

1
1

answered Sep 16 '15 at 09:01

mackatozis

252
1
5
15

Doesn't work. `valid` is executed in a subshell, and `exit 1` only leaves that subshell. Replace the `docker pull` commands with `valid sleep 10` to see that you need a `wait` at the end, and put a `valid false` in there to see that once you have the `wait`, the `valid` doesn't help you. – Wintermute Sep 16 '15 at 09:30
Well, `set -e` is a thought, but unfortunately the spawning does not actually fail, so this will still run all commands and only exit when `wait` returns with an error code after all subprocesses are collected. – Wintermute Sep 16 '15 at 09:51
Either you'd have `|| exit 1` after the spawning of the background shell and it would never trigger, or it would be inside the background shell and only leave that background shell, so it'd have no effect on all the other background shells. I don't see how it could work. – Wintermute Sep 16 '15 at 09:56

Make bash wait command fail on first error

2 Answers2

Extra note: