11

I have a server process (launched from systemd) which can launch an update process. The update process self-daemonizes itself and then (in theory) kills the server with SIGTERM. My problem is that the SIGTERM propagates to the update process and it's children.

For debugging purposes, the update process just sleeps, and I send the kill by hand.

Sample PS output before the kill:

    1  1869  1869  1869 ?           -1 Ss       0   0:00 /usr/local/bin/state_controller --start
 1869  1873  1869  1869 ?           -1 Sl       0   0:00  \_ ProcessWebController --start
 1869  1886  1869  1869 ?           -1 Z        0   0:00  \_ [UpdateSystem] <defunct>
    1  1900  1900  1900 ?           -1 Ss       0   0:00 /bin/bash /usr/local/bin/UpdateSystem refork /var/ttm/update.bin
 1900  1905  1900  1900 ?           -1 S        0   0:00  \_ sleep 10000

Note that UpdateSystem is in a separate PGID and TPGID. (The <defunct> process is a result of the daemonization, and is not (I think) a problem.)

UpdateSystem is a bash script (although I can easily make it a C program if that will help). After the daemonization code taken from https://stackoverflow.com/a/29107686/771073, the interesting bit is:

#############################################
trap "echo Ignoring SIGTERM" SIGTERM
sleep 10000
echo Awoken from sleep - presumably by the SIGTERM
exit 0

When I kill 1869 (which sends SIGTERM to the state_controller server process, my logfile contains:

Terminating
Ignoring SIGTERM
Awoken from sleep - presumably by the SIGTERM

I really want to prevent SIGTERM being sent to the sleep process.


(Actually, I really want to stop it being sent to apt-get upgrade which is stopping the system via the moral equivalent of systemctl stop ttm.service and the ExecStop is specified as /bin/kill $MAINPID - just in case that changes anyone's answer.)

This question is similar, but the accepted answer (use KillMode=process) doesn't work well for me - I want to kill some of the child processes, just not the update process: Can't detach child process when main process is started from systemd

Community
  • 1
  • 1

5 Answers5

7

We were having exactly the same problem. What we ended up doing is launching the update process as transient cgroup with systemd-run:

systemd-run --unit=my_system_upgrade --scope --slice=my_system_upgrade_slice -E  setsid nohup start-the-upgrade &> /tmp/some-logs.log &

That way, the update process will run in a different cgroup and will not be terminated. Additionally, we use setsid + nohup to make sure the process has its own group and session and that the parent process is the init process.

tsauerwein
  • 5,841
  • 3
  • 36
  • 49
5

A completely different approach is for the upgrade process to remove itself from the service group by updating the /sys/fs/cgroup/systemd filesystem. Specifically in bash:

echo $$ > /sys/fs/cgroup/systemd/tasks

A process belongs to exactly one control group. Writing its PID to the root tasks file adds it to the other control group, and removes it from the service control group.

  • 1
    Some distros have been switched to _cgroupsv2_ only. These are missing `tasks` file and `cgroup.procs` should be used instead. See https://man.archlinux.org/man/cgroups.7#Creating_cgroups_and_moving_processes – Dawer Feb 04 '22 at 20:24
  • some distros use 'unified' cgroup. systemd on ubuntu18.04 kill all child process under `/sys/fs/cgroup/unified/system.slice/myservice.servie/cgroup.procs`. This in bash works: `echo $$ > /sys/fs/cgroup/{systemd,unified}/cgroup.procs` – seamaner Sep 16 '22 at 10:42
  • @seamaner: bash: /sys/fs/cgroup/{systemd,unified}/cgroup.procs: ambiguous redirect – gentooise Nov 22 '22 at 15:17
3

The approach we have decided to take is to launch the update process in a separate (single-shot) service. As such, it automatically belongs to a separate control group, so killing the main service doesn't kill it.

There is a wrinkle to this though. The package installs ttm.service and ttm.template.update.service. To run the updater, we copy ttm.template.update.service to ttm.update.service, run systemctl daemon-reload, and then run systemctl start ttm.update.service. Why the copy? Because when the updater installs a new version of ttm.template.update.service, it will forcibly terminate any processes running as that service. KillMode=None appears to offer a way round that, but although it appears to work, a subsequent call to apt-get yields a nasty error about dpkg having been interrupted.

1

Are you sure it is not systemd sending the TERM signal to the child process?

Depending on the service type, if your main process dies, systemd will do a cleanup and terminate all the child processes under the same cgroup.

This is defined by KillMode= property which is by default set to control-group. You could set it to "none" or "process". https://www.freedesktop.org/software/systemd/man/systemd.kill.html

Umut
  • 2,317
  • 1
  • 17
  • 19
0

I have same sitation with you.

Upgrade process is a child process of parent process. The parent process is call by a service.

The main point is not Cgroup, is MAINPID.

If you use PIDFILE to sepecify the MAINPID, when the service type = forking, then the situation solved.

[Service]
Type=forking
PIDFile=/run/test.pid
Sam Young
  • 1
  • 1