39

I have a bash script to test how a server performs under load.

num=1
if [ $# -gt 0 ]; then
    num=$1
fi
for i in {1 .. $num}; do
    (while true; do
        { time curl --silent 'http://localhost'; } 2>&1 | grep real
    done) &
done        

wait

When I hit Ctrl-C, the main process exits, but the background loops keep running. How do I make them all exit? Or is there a better way of spawning a configurable number of logic loops executing in parallel?

ykaganovich
  • 14,736
  • 8
  • 59
  • 96

6 Answers6

59

Here's a simpler solution -- just add the following line at the top of your script:

trap "kill 0" SIGINT

Killing 0 sends the signal to all processes in the current process group.

Russell Davis
  • 8,319
  • 4
  • 40
  • 41
  • 1
    That sounds nice and clean, but I don't understand how process groups are managed. Is it guarranteed that all the background processes I'm spawning, and no other ones are in the same process group as the script? – ykaganovich Dec 05 '11 at 19:12
  • 1
    Yes, that's the default behavior for process groups. Unless you wrote code that explicitly makes a system call to change one the process's group, you'll be fine. – Russell Davis Dec 05 '11 at 23:29
  • @RussellDavis This is so clean, and works very well. I had to add traps to all the shell scripts I spawned from the master script to make this work. – nograpes Nov 04 '12 at 22:10
  • 3
    Any particular reason not to also trap SIGTERM and EXIT like this answer has? http://stackoverflow.com/a/2173421/179583 – natevw Mar 06 '13 at 18:44
  • @natevw It shouldn't hurt anything. The question specifically asked about exiting via Ctrl-C, for which SIGINT is sufficient. – Russell Davis Mar 06 '13 at 19:19
  • @natevw, trapping `SIGTERM` will [segfault Bash 4.3](https://bugs.launchpad.net/ubuntu/+source/bash/+bug/1337827) which allows trap recursion. The obvoius workaround is to de-register traps in the handler :) I posted [an example](http://stackoverflow.com/a/28333938/804678) in an answer to a similar question. – skozin Feb 05 '15 at 00:23
  • 1
    Beware, `kill 0` will sometimes kill "parent" processes (in the case that they executed the script without creating a process group.) – Ivan Kozik Nov 30 '17 at 10:06
  • 3
    I'm curious how you found this out. I can't find it in the man page for kill. – jmrah Dec 07 '17 at 19:29
  • @jmrah, it's in the man kill(2) (manual for system call): "If pid equals 0, then sig is sent to every process in the process group of the calling process." – Michał Góral Mar 31 '23 at 10:02
5

One way to kill subshells, but not self:

kill $(jobs -p)
Craig McQueen
  • 41,871
  • 30
  • 130
  • 181
5

Bit of a late answer, but for me solutions like kill 0 or kill $(jobs -p) go too far (kill all child processes).

If you just want to make sure one specific child-process (and its own children) are tidied up then a better solution is to kill by process group (PGID) using the sub-process' PID, like so:

set -m
./some_child_script.sh &
some_pid=$!

kill -- -${some_pid}

Firstly, the set -m command will enable job management (if it isn't already), this is important, as otherwise all commands, sub-shells etc. will be assigned to the same process group as your parent script (unlike when you run the commands manually in a terminal), and kill will just give a "no such process" error. This needs to be called before you run the background command you wish to manage as a group (or just call it at script start if you have several).

Secondly, note that the argument to kill is negative, this indicates that you want to kill an entire process group. By default the process group ID is the same as the first command in the group, so we can get it by simply adding a minus sign in front of the PID we fetched with $!. If you need to get the process group ID in a more complex case, you will need to use ps -o pgid= ${some_pid}, then add the minus sign to that.

Lastly, note the use of the explicit end of options --, this is important, as otherwise the process group argument will be treated as an option (signal number), and kill will complain it doesn't have enough arguments. You only need this if the process group argument is the first one you wish to terminate.

Here is a simplified example of a background timeout process, and how to cleanup as much as possible:

#!/bin/bash
# Use the overkill method in case we're terminated ourselves
trap 'kill $(jobs -p | xargs)' SIGINT SIGHUP SIGTERM EXIT

# Setup a simple timeout command (an echo)
set -m
{ sleep 3600; echo "Operation took longer than an hour"; } &
timeout_pid=$!

# Run our actual operation here
do_something

# Cancel our timeout
kill -- -${timeout_pid} >/dev/null 2>&1
wait -- -${timeout_pid} >/dev/null 2>&1
printf '' 2>&1

This should cleanly handle cancelling this simplistic timeout in all reasonable cases; the only case that can't be handled is the script being terminated immediately (kill -9), as it won't get a chance to cleanup.

I've also added a wait, followed by a no-op (printf ''), this is to suppress "terminated" messages that can be caused by the kill command, it's a bit of a hack, but is reliable enough in my experience.

Haravikk
  • 3,109
  • 1
  • 33
  • 46
  • The wait + printf method didn't work for me (using bourne shell), but adding `set +m` just after `kill` suppressed the "terminated" message. – Watcom Oct 04 '21 at 19:25
2

You need to use job control, which, unfortunately, is a bit complicated. If these are the only background jobs that you expect will be running, you can run a command like this one:

jobs \
  | perl -ne 'print "$1\n" if m/^\[(\d+)\][+-]? +Running/;' \
  | while read -r ; do kill %"$REPLY" ; done

jobs prints a list of all active jobs (running jobs, plus recently finished or terminated jobs), in a format like this:

[1]   Running                 sleep 10 &
[2]   Running                 sleep 10 &
[3]   Running                 sleep 10 &
[4]   Running                 sleep 10 &
[5]   Running                 sleep 10 &
[6]   Running                 sleep 10 &
[7]   Running                 sleep 10 &
[8]   Running                 sleep 10 &
[9]-  Running                 sleep 10 &
[10]+  Running                 sleep 10 &

(Those are jobs that I launched by running for i in {1..10} ; do sleep 10 & done.)

perl -ne ... is me using Perl to extract the job numbers of the running jobs; you can obviously use a different tool if you prefer. You may need to modify this script if your jobs has a different output format; but the above output is also on Cygwin, so it's very likely identical to yours.

read -r reads a "raw" line from standard input, and saves it into the variable $REPLY. kill %"$REPLY" will be something like kill %1, which "kills" (sends an interrupt signal to) job number 1. (Not to be confused with kill 1, which would kill process number 1.) Together, while read -r ; do kill %"$REPLY" ; done goes through each job number printed by the Perl script, and kills it.

By the way, your for i in {1 .. $num} won't do what you expect, since brace expansion is handled before parameter expansion, so what you have is equivalent to for i in "{1" .. "$num}". (And you can't have white-space inside the brace expansion, anyway.) Unfortunately, I don't know of a clean alternative; I think you have to do something like for i in $(bash -c "{1..$num}"), or else switch to an arithmetic for-loop or whatnot.

Also by the way, you don't need to wrap your while-loop in parentheses; & already causes the job to be run in a subshell.

ruakh
  • 175,680
  • 26
  • 273
  • 307
  • Thanks for the tips, and especially thanks for the btw tips. I am not a bash expert, so I write it by googling. – ykaganovich Dec 02 '11 at 23:54
  • You're welcome! I know exactly what you mean. I'm not a Bash expert, either, and I was in the same boat as you until about a year or so ago, when I found the Bash reference manual (linked to in my answer). It's totally changed my life, or at least the Bash part of it. :-P – ruakh Dec 03 '11 at 00:09
0

While these is not an answer, I just would like to point out something which invalidates the selected one; using jobs or kill 0 might have unexpected results; in my case it killed unintended processes which in my case is not an option.

It has been highlighted somehow in some of the answers but I am afraid not with enough stress or it has been not considered:

"Bit of a late answer, but for me solutions like kill 0 or kill $(jobs -p) go too far (kill all child processes)."

"If these are the only background jobs that you expect will be running, you can run a command like this one:"

0

Here's my eventual solution. I'm keeping track of the subshell process IDs using an array variable, and trapping the Ctrl-C signal to kill them.

declare -a subs #array of subshell pids

function kill_subs() {
    for pid in ${subs[@]}; do
        kill $pid
    done
    exit 0 
}

num=1 if [ $# -gt 0 ]; then
    num=$1 fi

for ((i=0;i < $num; i++)); do
    while true; do
       { time curl --silent 'http://localhost'; } 2>&1 | grep real
    done &

    subs[$i]=$! #grab the pid of the subshell 
done

trap kill_subs 1 2 15

wait
ykaganovich
  • 14,736
  • 8
  • 59
  • 96