Exit a bash script if an error occurs in it or any of the background jobs it creates

Question

Background

I'm working on a bash script to automate the process of building half a dozen projects that live in the same directory. Each project has two scripts to run in order to build it:

npm install
npm run build

The first line will fetch all of the dependencies from npm. Since this step takes the longest, and since the projects can fetch their dependencies simultaneously, I'm using a background job to fetch everything in parallel. (ie: npm install &)

The second line will use those dependencies to build the project. Since this must happen after all the Step 1s finish, I'm running the wait command in between. See code snippet below.

The Question

I would like to have my script exit as soon as an error occurs in any of the background jobs, or the npm run build step that happens afterward.

I'm using set -e, however this does not apply to the background jobs, and thus if one project fails to install it's dependencies, everything else keeps going.

Here is an simplified example of how my script looks right now.

build.sh

set -e

DIR=$PWD

for dir in ./projects/**/
do
    echo -e "\033[4;32mInstalling $dir\033[0m"
    cd $dir
    npm install & # takes a while, so do this in parallel
    cd $DIR
done

wait # continue once the background jobs are completed

for dir in ./projects/**/
do
    cd $dir
    echo -e "\033[4;32mBuilding $dir\033[0m"
    npm run build # Some projects use other projects, so build them in series
    cd $DIR
    echo -e "\n"
done

Again, I don't want to continue doing anything in the script if an error occurs at any point, this applies to both the parent and background jobs. Is this possible?

BTW, `echo -e` is bad form: the POSIX specification for `echo` requires `echo -e` to print `-e` on its output, so you're literally writing code that will only work the way you intend on a *non*-compliant shell. Use `printf '%b\n' "string with escape sequences"` to print content in a fully portable way. — Charles Duffy, Mar 30 '16 at 18:04
See the POSIX spec for `echo` at http://pubs.opengroup.org/onlinepubs/009604599/utilities/echo.html, noting the APPLICATION USAGE section. — Charles Duffy, Mar 30 '16 at 18:04
Also, re: best practices for color, see BashFAQ #37: http://mywiki.wooledge.org/BashFAQ/037 — Charles Duffy, Mar 30 '16 at 18:05
(btw, `cd $dir` is buggy if your directory name contains whitespace or glob characters that could expand to match anything; use `cd "$dir"` for safety -- and consider scoping it inside a subshell so you don't need to `cd` back to the original directory in your parent shell; whereas subshells usually carry a performance penalty, if you're executing an external program as the only action in that subdirectory you can eliminate that penalty by using `exec` to skip the fork() process on the execution, so the fork you pay for the subshell is balanced out). — Charles Duffy, Mar 30 '16 at 18:10
(...and while I'm kibitzing -- see fourth paragraph of http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap08.html; all-caps environment variable names are reserved for variables with meaning to the system or shell; since shell variables use the same namespace as environment variables, that convention applies there too. Consider something like `oldDir=$PWD` -- it's more descriptive and avoids colliding with the all-caps reserved namespace). — Charles Duffy, Mar 30 '16 at 18:14

Charles Duffy · Accepted Answer · 2016-03-30T19:42:25.017

11

Collect the PIDs for the background jobs; then, use wait to collect the exit status of each, exiting the first time any PID polled over in that loop is nonzero.

install_pids=( )
for dir in ./projects/**/; do
  (cd "$dir" && exec npm install) & install_pids+=( $! )
done
for pid in "${install_pids[@]}"; do
  wait "$pid" || exit
done

The above, while simple, has a caveat: If an item late in the list exits nonzero prior to items earlier in the list, this won't be observed until that point in the list is polled. To work around this caveat, you can repeatedly iterate through the entire list:

install_pids=( )
for dir in ./projects/**/; do
  (cd "$dir" && exec npm install) & install_pids+=( $! )
done
while (( ${#install_pids[@]} )); do
  for pid_idx in "${!install_pids[@]}"; do
    pid=${install_pids[$pid_idx]}
    if ! kill -0 "$pid" 2>/dev/null; then # kill -0 checks for process existance
      # we know this pid has exited; retrieve its exit status
      wait "$pid" || exit
      unset "install_pids[$pid_idx]"
    fi
  done
  sleep 1 # in bash, consider a shorter non-integer interval, ie. 0.2
done

However, because this polls, it incurs extra overhead. This can be avoided by trapping SIGCHLD and referring to jobs -n (to get a list of jobs whose status changed since prior poll) when the trap is triggered.

edited Mar 30 '16 at 19:42

answered Mar 30 '16 at 18:02

Charles Duffy

280,126
43
390
441

The critical detail here is that you need to wait for each process in turn. You can't just use `wait` to wait for them all. You don't get the return code if you do that. – Etan Reisner Mar 30 '16 at 18:28
This `exit`s immediately. That may or may not be what the OP wants (though I don't believe it will make a difference in this case). – Etan Reisner Mar 30 '16 at 18:28
@CharlesDuffy This still has the script waiting for the exit status of every background job. Is there way to bail out early if the first one has a non-zero exit status? – Danny Delott Mar 30 '16 at 19:05
@DannyDelott, it actually does bail out early if one earlier in the PID list exits nonzero; the case where it doesn't bail out early is if it's one *late* in the list that exits nonzero early, That can be polled for, yes. – Charles Duffy Mar 30 '16 at 19:30
@DannyDelott, ...I've added another version that incurs polling overhead to exit early when possible. – Charles Duffy Mar 30 '16 at 19:32
Thanks for the help so far @CharlesDuffy; it still seems to continue running other background processes when one of them fails. I restricted the number of projects to just 2 and forced Project A to fail, however Project B continues to execute. When I run it with `set -x` it looks like it never evaluates the ` || exit` clause. – Danny Delott Mar 30 '16 at 20:12
I'm running GNU bash, version 3.2.57(1)-release if that helps. – Danny Delott Mar 30 '16 at 20:13
@DannyDelott, correct, it doesn't currently force the others to exit -- if you want to do that, it's the obvious change: `wait "$pid" || { kill "${install_pids[@]}"; exit 1; }`. (I took the qualifier "in the script" to mean that you only wanted to exit the script, not that you wanted to force its children to be interrupted). – Charles Duffy Mar 30 '16 at 20:15

SaintHax · Answer 2 · 2016-03-30T19:50:11.357

1

Bash isn't made for parallel processing such as this. To accomplish what you want, I had to write a function library. I'd suggest seeking a language more readily suited to this if possible.

The problem with looping through the pids, such as this...

#!/bin/bash
pids=()
f() {
   sleep $1
   echo "no good"
   false
}

t() {
   sleep $1
   echo "good"
   true
}

t 3 &
pids+=$!

f 1 &
pids+=$!

t 2 &
pids+=$!
for p in ${pids[@]}; do
   wait $p || echo failed
done

The problem is that "wait" will wait on the first pid, and if the other pids finish before the first one does, you'll not catch the exit code. The code above shows this problem on bash v4.2.46. The false command should produce output that never gets caught.

edited Mar 30 '16 at 19:50

answered Mar 30 '16 at 19:19

SaintHax

1,875
11
16

1

See the amendment to my answer -- the OP's goal *is* solvable using only shell builtins. – Charles Duffy Mar 30 '16 at 19:33
BTW, is this an answer at all? I don't see anything here that shows a proposed solution. – Charles Duffy Mar 30 '16 at 19:36
I would have added a comment to your answer, but S.O. demands a 50 rep for that? Your edit is great-- the kill -0 nested loop is genius. – SaintHax Mar 30 '16 at 19:54

Exit a bash script if an error occurs in it or any of the background jobs it creates

2 Answers2

Linked

Related