Conventional practice for signal handling/child processes

Question

EDIT: If managing child processes for a shell script is really purely a matter of "opinion"......no wonder there are so many terrible shell scripts. Thanks for continuing that.

I'm having trouble understanding how SIGTERM is conventionally handled with relation to child processes in Linux.

I am writing a command line utility in Bash.

It looks like

command1
command2
command3

Very simple, right?

However, if my program is sent SIGTERM signal, the Bash script will end but the current child process (e.g. command2) will continue.

But with some more code, I can write my program like this

trap 'jobs -p | xargs -r kill' TERM

command1 &
wait
command2 &
wait
command3 &
wait

That will propogate SIGTERM to the currently running child process. I haven't often seen Bash scripts written like that, but that's what it would take.

Should I:

Write my program in the second style each time I create a child process?
Or expect users to launch my program in a process group if they want to send SIGTERM?

What's the best practice/conventions for process management responsibilities with respect to SIGTERM for children?

An exit trap like this: `trap 'kill -- -$$' EXIT` will kill the child processes if the script exits due to `SIGTERM`. Including the current command, and any commands launched in the background with `cmd &`. — dan, Feb 05 '22 at 07:05
@dan that only works if the process is the group leader. `bash -ec "echo begin; bash -c \"trap 'kill -- -\\$\\$' EXIT\"; echo end"` yields an error. Regardless if that works, is that something that my script should assume responsibility for? — Paul Draper, Feb 05 '22 at 16:08
I wouldn't write anything in the second way. Why implement additional counter-intuitive logic in your script? Is it *actually needed* or are you trying to predict and accommodate an user who wants to do a kill all action? After all, a user could very well choose to just kill the script while letting the current subcommand finish without interruption, and your second solution removes this option. — Marco Bonelli, Feb 05 '22 at 20:05
`An exit trap like this: trap 'kill -- -$$' EXIT will kill the child processes` Yes, but it will not wait for childs. You get messages after your prompt is printed and terminal output is messy. `That will propogate SIGTERM to the currently running child process` Yes, but not to child of childs. So `( echo stuff; command1; ) &` will not kill command1, only kill the subshell. `Should I:` That's up to _you_ and _your_ script. How do you want it to work? || I recommend https://unix.stackexchange.com/a/609300/209955 which kills everything. — KamilCuk, Feb 05 '22 at 22:54
"That's up to you and your script. How do you want it to work?" That's..........the question. What is the convention for process management? — Paul Draper, Feb 06 '22 at 20:54

Paul Draper · Answer 1 · 2022-02-06T21:25:03.293

tl;dr

The first way.

If a process starts a child process and waits for it to finish (the example), nothing special is necessary.

If a process starts a child process and may prematurely terminate it, it should start that child in a new process group and send signals to the group.

Details

Oddly for how often this applies (like, every shell script), I can't find a good answer about convention/best practice.

Some deduction:

Creating and signaling process groups are very common. In particular, interactive shells do this. So (unless it takes extra steps to prevent it) a processes' children can receive SIGINT signals at any time, in very normal circumstances.

In the interest of supporting as few paradigms as possible, it seems to make sense to rely on that always.

That means the first style is okay, and the burden of process management is placed on processes that deliberately terminate their children during regular operation (which is relatively less common).

See also "Case study: timeout" below for further evidence.

How to do it

While the perspective of the question was from the requirements of a vanilla callee program, this answer prompts the question: how does one start a process in a new process group (in the non-vanilla case that one wishes to prematurely interrupt the process)?

This is easy in some languages and difficult in others. I've created a utility run-pgrp to assist in the latter case.

#!/usr/bin/env python3
# Run the command in a new process group, and forward signals.
import os
import signal
import sys

pid = os.fork()
if not pid:
    os.setpgid(0, 0)
    os.execvp(sys.argv[1], sys.argv[1:])

def receiveSignal(sig, frame):
    os.killpg(pid, sig)
signal.signal(signal.SIGINT, receiveSignal)
signal.signal(signal.SIGTERM, receiveSignal)

_, status = os.waitpid(-1, 0)
sys.exit(status)

The caller can use that to wrap the process that it prematurely terminate.

Node.js example:

const childProcess = require("child_process");
(async () => {
  const process = childProcess.spawn(
    "run-pgrp",
    ["bash", "-c", "echo start; sleep 600; echo done"],
    { stdio: "inherit" }
  );
  /* leaves orphaned process
  const process = childProcess.spawn(
    "bash",
    ["-c", "echo start; sleep 600; echo done"],
    { stdio: "inherit" }
  );
  */
  await new Promise(res => setTimeout(res, /* 1s */ 1000));
  process.kill();
  if (process.exitCode == null) {
    await new Promise(res => process.on("exit", res));
  }
})();

At the end of this program, the sleep process is terminated. If the command invoked directly without run-pgrp, the sleep process continues to run.

Case study: timeout

The GNU timeout utility is a program that may terminate its child process.

Notably, it runs the child in a new process group. This supports the conclusion that potential interruptions should be preceded by creating a new process group.

Interestingly, however, timeout puts itself in the process group as well, to avoid complexities around forwarding signals, but causing some strange behavior. https://unix.stackexchange.com/a/57692/56781

For example, in an interactive shell, run

bash -c "echo start; timeout 600 sleep 600; echo done"

Try to interrupt this (Ctrl+C). It doesn't respond, because timeout never gets the signal!

In contrast, my run-pgrp utility keeps itself in the original process group and forwards SIGINT/SIGTERM to the child group.

Conventional practice for signal handling/child processes

1 Answers1

tl;dr

Details

How to do it

Case study: timeout