Docker - init, zombies - why does it matter?

Question

I did read this article: https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/

To set some context: Article is about problem with zombies in containers, it try to convince us that it is a real problem.

Generally, I have mixed feelings. Why does it matter ? After all, even in case zombies in conainer host OS is able to release/kill this zombie. We know that process in container is from point of view host OS normal process (and in general process in container is normal process with some namespaces and cgroups).

Moreover, we can also find information that in order to avoid zombie problem we should use bash -c .... Why ? Maybe, better option is to use --init ?

Can someone try to explain these thing, please ?

Matt · Accepted Answer · 2020-02-26T20:41:20.100

For a brief but useful explanation of what an init process gives you, look at tini which is what Docker uses when you specify --init

Using Tini has several benefits:

It protects you from software that accidentally creates zombie processes, which can (over time!) starve your entire system for PIDs (and make it unusable).

It ensures that the default signal handlers work for the software you run in your Docker image. For example, with Tini, SIGTERM properly terminates your process even if you didn't explicitly install a signal handler for it.

Both these issues affect containers. A process in a container is still a process on the host, so it takes up a PID on the host. Whatever you run in a container is PID 1 which means it has to install a signal handler to get that signal.

Bash happens to have a process reaper included, so running a command under bash -c can protect against zombies. Bash won't handle signals by default as PID 1 unless you trap them.

Zombies

The first thing to understand is an init process doesn't magically remove zombies. A (normal) init is designed to reap zombies when the parent process that failed to wait on them exits and the zombies hang around. The init process then becomes the zombies parent and they can be cleaned up.

Next, a container is a cgroup of processes running in their own PID namespace. This cgroup is cleaned up when the container is stopped. Any zombies that are in a container are removed on stop. They don't reach the hosts init.

Third is the different ways containers are used. Most run one main process and nothing else. If there is another process spawned it is usually a child of that main process. So until the parent exits, the zombie will exist. Then see point 2 (the zombies will be cleared on container exit).

Running a Node.js, Go or Java app server in a container tends not to rely heavily on forking or spawning of processes.

Running something like a Jenkins worker that spawns large numbers of ad hoc jobs involving shells can result in a lot worse, but is ephemeral so exits regularly and cleans up

Running a Jenkins master that also spawns jobs. The container may hang around for a long time and leave a number of zombie processes which is the type of workload that could present a problem without a zombie reaper.

Signals

The other role an init process can provide is to install signal handlers so signals sent from the host can be passed onto the container process. PID 1 is a bit special as it requires the process to listen for a signal for it to be received.

If you can install a SIGINT and SIGTERM signal handler in your PID 1 process then an init process doesn't add much here.

When to use an init

When you want to run more than 1 service in a container

Multiple processes should be run under an init process. When Docker starts, the init manages how should they be launched. What is required for the container to actually be "running" for the service it represents. When the container stops, how that should be passed onto each process. You may want a more traditional init system though, s6 via s6-overlay provides a number of useful container features for multi process management.

When you run a single process that spawns a lot of child processes

Especially when processes are children of children or beyond. The CI worker (like Jenkins) example is the first that comes to mind where Java spawns command or shells that spawn other commands.

When you can't add signal handlers to the process running as PID 1.

sleep is a simple example of this. A docker run busybox sleep 60 can't be interrupted with ctrl-c or stopped, it will be killed after the default 10 second docker stop timeout. docker run --init busybox sleep 60 works as expected.

Whenever

tini is pretty minimal overhead and widely used, so why not use --init most of the time?

For more details see this github comment which answers the "why?" question from the creator of tini.

Could you explain cause of `10 s.` timeout in case of sending stop signal to `sleep 60` incase lack of `--init` ? After all, `sleep` should be `PID=1` so it receive this signal (so there is no need to `init` forward it to childs) — Spring fancy, Mar 10 '18 at 15:32
PID 1 is treated as a special process by the linux kernel and will only receive a signal if a signal handler for it is added. Docker sends a `SIGTERM` on `docker stop`. Waits for a timeout period for the process to exit, which defaults to 10 seconds. Then a `SIGKILL` is sent if the process still exists. SIGKILL is not something a process listens for, that's the kernel removing a process. `sleep` doesn't install any signal handlers so after the timeout, it's killed. — Matt, Mar 10 '18 at 20:49
So, how does it work ? sleep is not able to receive signals at all, however sleep in combination with `--init` is able to receive signals. How ? — Spring fancy, Mar 10 '18 at 23:39
The init process becomes PID 1 and adds it's own signal handlers. The `init` process chooses what to do with those signals, normally that is to forward them straight on. Once `sleep` is not PID 1 it will receive _all_ signals sent to it, no matter what signal handlers `sleep` has implemented in code. — Matt, Mar 10 '18 at 23:51
hmm, thanks to this `init` in Linux can't be killed using signals. So `SIGTERM` is passed: `Host -> PID 1 (init) -> PID <> 1 (sleep)`. As I understand default handler of `SIGTERM` installed by Linux in `sleep` exit this process ? What in case when some process install handler which ignores `SIGTERM`? For example empty body of handler ? — Spring fancy, Mar 11 '18 at 09:11
[here's an example](https://github.com/deployable/node-docker-demo-app/blob/90b4cb689eed32bb42e8229d65df0eeac7f9bd28/index.js) of signal handlers in a nodejs app. If the signal handling code doesn't do anything, then nothing happens. — Matt, Mar 11 '18 at 10:38
A docker init process would differ in that it would try and send the signal to it's children before exiting itself. A normal init process wouldn't ever try and exit unless they system was shutting down. — Matt, Mar 11 '18 at 10:39
Is there no some logich which: 1. Send SIGTERM. 2. Wait 10s. 3. Check if process exited. 4. If no, then send SIGKILL (=don't ask, simply kill it). What do you think ? In other words I don't understand where is place for timeout 10s. — Spring fancy, Mar 11 '18 at 11:55
Yes, that's what docker does. That's what I was trying to describe in my second comment about `sleep` . `docker stop -t N` adjusts the seconds docker waits. — Matt, Mar 11 '18 at 20:18

score 3 · Answer 2 · answered Mar 07 '18 at 23:18

3

I referenced that article in "Use of Supervisor in docker"

Since Sept. 2016 and docker 1.12, docker run --init is helping fighting zombie processes by adding an init process.

That solves typically the following issue

We can't use docker start as we need to pass things like port mappings, and env vars. So we use docker run.
But when upstart sends SIGINT to the docker run client process, the container doesn't die, just the client does. Then when upstart goes to start it back up, it's already running, and the port mapping fails.

Or this issue:

Docker seems to hang when spawning child processes inside executed scripts.

Basically, you want a docker container to kill all sub-processes, in order to clean resources (port, files handlers, ...) used by said sub-processes.

answered Mar 07 '18 at 23:18

VonC

1,262,500
529
4,410
5,250

That first issue is more a problem with using `docker run` under upstart (or any service manager). The lack of adding a signal handler to the PID 1 process just highlights why running a container like that is a bad idea! – Matt Mar 08 '18 at 04:22
Does it mean that we always should add `--init` flag to run of docker ? – Spring fancy Mar 08 '18 at 19:53
If you have more than one process, yes – VonC Mar 08 '18 at 20:09
I don't understand you exactly. Do you suggest that parent process in docker container doesn't kili its childs? – Spring fancy Mar 08 '18 at 20:51
@Springfancy not always, hence the init: see https://stackoverflow.com/a/39593409/6309 – VonC Mar 08 '18 at 20:53
1

@VonC ok, I get it. Moreover, I can see that processes started using `docker exec` are not manage by `--init`. In other words zombies generated by process executed by `docker exec` are not reaped by `--init` (only zombies generated by process run by `docker run`). It is worth to note that `exec` cause that we have another tree of processes in the same namespace as process run by `docker run.` However, process executed by `docer run` is very important and is some meaning more important that executed by `docker exec`. To sum up, `docker exec` and `--init` have nothing in common. Am I right? – Spring fancy Mar 10 '18 at 15:15
1

@Springfancy Yes. `docker exec` is for debug only, and is not managed by --init at all. – VonC Mar 11 '18 at 00:09
@VonC, of course `docker exec` is for debug perposes. However, it is often used to execute additional process in container (container is blocked by some infinite command). We can think about it as forest of processes. If we use container according to recommendations (one process per one container) then we have one tree of processes (no forest). In case one tree and `--init` options everything is allright. However in case of forest of processes (exploiting `exec` for non-debug perposes) `--init` can't help because `init` **only** supervise tree of processes which will be passed to `run` command – Spring fancy Mar 11 '18 at 09:07
@Springfancy Yes, you are correct. `--init` won't apply to processes done in the context of `--exec`. – VonC Mar 11 '18 at 11:28
@Vonc and generally, what about my reasoning ? – Spring fancy Mar 11 '18 at 11:53
@Springfancy your reasonning is sound: `docker exec` and `--init` have nothing in common. – VonC Mar 11 '18 at 11:55
Do you agree that using `exec` as way for executing process in container (no debug aims) is not fit to containers ? – Spring fancy Mar 11 '18 at 12:07
@Springfancy yes, as I was saying 3 years ago in https://stackoverflow.com/a/33221192/6309, pointing to https://github.com/moby/moby/issues/9299#issuecomment-64177898 – VonC Mar 11 '18 at 12:09