Bash does not exit automatically when a background process exits with non-zero status, even if set -e
(set -o errexit
) is active. If you want your program to exit when a background process fails then you'll need to explicitly detect the failure.
If you've got Bash 4.3 (released in 2014) or later then you can do it in the code in the question by replacing
wait
with
while wait -n; do
:
done
- The
-n
option to wait
was introduced in Bash 4.3. It causes wait
to wait for the next background process to exit, and returns its status.
- The loop runs until
wait -n
returns non-zero status; either because a background process exited with non-zero status or all background processes have exited with zero status.
- See ProcessManagement - Greg's Wiki for more information about
wait -n
, and general information about handling background processes in Bash. That page says to run set -m
in programs that use wait -n
. I haven't found that to be necessary, but that may be because I am using a much later version of Bash. YMMV.
Although the while wait ...
loop allows the program to continue as soon as a background process fails (e.g. so it can call exit
) it may leave other background processes still running, even after the main program exits. That could lead to unwanted processing and/or unexpected output to the terminal. You might want to kill any remaining background processes after the while wait ...
loop terminates. One way to do that is:
jobs_output=$(jobs)
while IFS= read -r line; do
jobnum=${line#*\[}
jobnum=${jobnum%%\]*}
kill "%$jobnum"
done <<<"$jobs_output"
With versions of Bash older than 4.3 my preferred option for managing background processes is to use the jobs command in a polling loop.
This is a, Shellcheck-clean, modified version of your program that demonstrates the technique:
#! /bin/bash -p
function status
{
local -r exit_st=$1
local -r error_1=$2
if (( exit_st == 0 )); then
printf '[INFO] - %s\n' "$error_1" >&2
else
printf '[ERROR] - %s\n' "$error_1" >&2
exit 1
fi
}
function abc
{
local -r val1=$1
local -r val2=$2
run_sql_command "$val1" "$val2"
status "$?" 'SQL command step'
}
# Wait for background processes (specified by PIDs given as function
# arguments) to complete.
# If any background process completes with non-zero exit status, return
# immediately (without waiting for any other background processes) using the
# failed process's exit status as the return status.
function wait_for_pids
{
local -r bgpids=( "$@" )
# Use a sparse array ('is_active_pids') indexed by PID values to maintain
# a set of background processes that are still active
local pid is_active_pid=()
for pid in "${bgpids[@]}"; do
is_active_pid[pid]=1
done
local jobs_output old_active_pids=()
while (( ${#is_active_pid[*]} > 0 )); do
# Get a list of PIDs of background processes that are still active
jobs_output=$(jobs -pr)
IFS=$'\n' read -r -d '' -a active_pids <<<"$jobs_output"
old_active_pids=( "${!is_active_pid[@]}" )
# Update the set of still active background PIDs
is_active_pid=()
for pid in ${active_pids[@]+"${active_pids[@]}"}; do
is_active_pid[pid]=1
done
# Find processes that are no longer active (i.e. they have exited)
# and check their exit statuses
for pid in "${old_active_pids[@]}"; do
if (( ! ${is_active_pid[pid]-0} )); then
wait "$pid" || return "$?"
fi
done
sleep 1
done
}
# Kill all background processes that are running, and exit the program
# with the exit status provided as an argument
function kill_running_jobs_and_exit
{
local -r exit_status=$1
local jobs_output line jobnum
jobs_output=$(jobs -r)
while IFS= read -r line; do
[[ $line == *\[*\]* ]] || continue
jobnum=${line#*\[}
jobnum=${jobnum%%\]*}
# Kill by job number instead of PID because killing by PID is
# subject to race conditions that may cause the wrong process to be
# killed
kill "%$jobnum"
printf '[INFO] - Killed: %s\n' "$line" >&2
done <<<"$jobs_output"
exit "$exit_status"
}
bgpids=()
abc cmd1 cmd2 &
bgpids+=( "$!" )
abc cmd3 cmd4 &
bgpids+=( "$!" )
wait_for_pids "${bgpids[@]}" || kill_running_jobs_and_exit "$?"
echo 'hi'
- Several of changes are minor ones to fix Shellcheck warnings or to convert to standard or best practices (e.g. sending diagnostic output to standard error and using
printf
instead of echo
).
- One significant change is that an array,
bgpids
, is used to keep a list of PIDs of background processes.
- Another significant change is the addition of two new functions:
wait_for_pids
and kill_running_jobs_and_exit
.
- The final significant change is replacing
wait
with wait_for_pids "${bgpids[@]}" || kill_running_jobs_and_exit "$?"
.
- Replace
run_sql_command "$val1" "$val2"
with whatever is appropriate for you. I wrote and used a function called run_sql_command
for testing.
- I've used a polling interval of one second (
sleep 1
). Something different might be better for you (e.g. sleep 10
or (if your sleep
supports floating point arguments) sleep 0.1
).
- See the Sparse Arrays section of BashGuide/Arrays - Greg's Wiki for information about how the
is_active_pid
array is used.
${active_pids[@]+"${active_pids[@]}"}
is used instead of "${active_pids[@]}
to work around a bug in older versions of Bash that caused it to mishandle empty arrays when set -o nounset
(set -u
) is in effect. See bash empty array expansion with 'set -u'.
- I tested the code with Bash version 3.2. It should work with all later versions of Bash too.