7

I have a Docker container that runs bash at PID1 which in turn runs a long-running (complex) service that sometimes produces zombie processes parented to the bash at PID1. These zombies are seemingly never reaped.

I'm trying to reproduce this issue in a minimal container so that I can test mitigations, such as using a proper init as PID1 rather than bash.

However, I have been unable to reproduce the zombie processes. The bash at PID1 seems to reap children, even those it inherited from another process.

Here is what I tried:

docker run -d ubuntu:14.04 bash -c \
  'bash -c "start-stop-daemon --background --start --pidfile /tmp/sleep.pid --exec /bin/sleep -- 30; sleep 300"'

My expectation was that start-stop-daemon would double-fork to create a process parented to the bash at PID1, then exec into sleep 30, and when the sleep exits I expected the process to remain as a zombie. The sleep 300 simulates a long-running service.

However, bash reaps the process, and I can observe that by running strace on the bash process (from the host machine running docker):

$ sudo strace -p 2051
strace: Process 2051 attached
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 9
wait4(-1,

I am running docker 1.11.1-rc1, though I have the same experience with docker 1.9.

$ docker --version
Docker version 1.11.1-rc1, build c90c70c
$ uname -r
4.4.8-boot2docker

Given that strace shows bash reaping (orphaned) children, is bash a suitable PID1 in a docker container? What else might be causing the zombies I'm seeing in the more complex container? How can I reproduce?

Edit:

I managed to attach strace to a bash PID1 on one of the live containers exhibiting the problem.

Process 20381 attached
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 11185
wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], 0, NULL) = 11191
wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], 0, NULL) = 11203
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 11155
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 11151
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 11152
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 11154
wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], 0, NULL) = 11332
...

Not sure exactly what all those exiting processes are, but none of the PIDs match those of the few defunct zombie processes that were shown by docker exec $id ps aux | grep defunct.

Maybe the trick is to catch it in action and see what wait4() returns on a process that remains a zombie...

Patrick
  • 5,714
  • 4
  • 31
  • 36
  • Would the test case mentioned in http://stackoverflow.com/a/39593409/6309 help see zombie processes? – VonC Sep 20 '16 at 11:55
  • I'm no longer working on this issue, but I believe it was resolved by upgrading the Linux kernel, suggesting the cause was a kernel bug and/or a bug in the interaction between docker and the kernel. – Patrick Sep 24 '16 at 02:55

3 Answers3

2

I also wanted to verify if my jenkins container slaves can generate zombies or not.

Since my images run the scl binary which in turn starts the java JLNP client, I performed the following in jenkins slave groovy script console:

def process=new ProcessBuilder("bash", '-c', 'sleep 10 </dev/null &>/dev/null & disown').redirectErrorStream(true).start()
println process.inputStream.text
println " ps -ef".execute().text

Zombies have been generated. That is with scl ending up as PID 1.

Then I looked at your question and decided to try out bash. My first attempt was changing ENTRYPOINT to this:

bash -c "/usr/bin/scl enable rh-ror42 -- /usr/local/bin/run-jnlp-client $1 $2" --

Then looking at ps output I realized that PID 1 was not bash but in fact PID 1 was still the scl binary. Finally changed command to:

bash -c "/usr/bin/scl enable rh-ror42 -- /usr/local/bin/run-jnlp-client $1 $2 ; ls" --

That is adding some random second command after the scl command. And voila - bash became PID 1 and no zombies generate anymore.

Looking at your example, I see that you run bash -c with more than one command. So in your test bed, you are running something like my last command. But in your work containers, it is likely that you run bash -c with only one command and it appears bash became clever enough to effectively do an exec. And probably in your work containers that generate zombies, bash is not actually PID 1 contrary to what you expect.

Perhaps you can ps -ef inside your existing work containers and verify if my guess is correct.

akostadinov
  • 17,364
  • 6
  • 77
  • 85
  • @gzerone, hi, not sure which trick exactly you are asking about. And nothing genius, just modifying container start command and looking at `ps -ef` output to see what is **actually** running. The big surprise for me was `bash` automatic trailing `exec` which might be handy for less experienced people in some situations but is very unfortunate (read surprising) in container use cases. – akostadinov Apr 01 '17 at 07:40
  • hi @akostadinov, I mean how did you know adding second command (like ;ls), so than bash became PID 1. I searched a long time and just know bash can reap zombie process in container, but it not works for me until I see your post here. Thx. – gzerone Apr 02 '17 at 16:47
  • 1
    @gzerone Well, some deduction. When I saw `bash` process replaced by the command I was executing I thought it's doing automatic `exec` call for the last command. If bash `exec`s some command it will not be able to run the following commands, thus it will not do it. So I thought, if I put another command (whatever it is) after my desired command, that would prevent this to happen. – akostadinov Apr 07 '17 at 08:33
  • hi @akostadinov, what about the SIGTERM unclean shutdown problem? https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/ – childno͡.de Jun 14 '17 at 17:02
  • @childno͡.de, if your app is affected by this, then I think you can do with the code provided in the blog + a `wait` call for the children. Your app would be affected if it writes to a persistent storage or you rely on the `TERM` behavior for failover when app is for example scaled down. In many cases you would **not** care though. – akostadinov Jun 26 '17 at 11:25
0

I hit the same problem while attempting to create a zombie process inside a container with bash as PID 1. Turns out (as you can see from the wait4() calls), that bash actually is waiting on all children in a tight loop (man wait explains that waiting on -1 will return when any child exits).

This means when an orphan is reparented to bash, bash will correctly wait on it to prevent it from remaining a zombie. Very strange that all literature on the internet says otherwise.

Tammer Saleh
  • 393
  • 4
  • 7
  • I don’t think all the literature on the internet says bash doesn’t reap zombie children and wait() on them. :) There are still plenty of reasons that bash is not a good PID1 such as it cannot forward signals to child processes for graceful termination. – ahmet alp balkan Dec 13 '20 at 21:54
  • It can but it is some work: https://gist.github.com/bronger/acce7736141b3fa118b0d47f1a2035ac – Torsten Bronger Dec 14 '20 at 10:34
  • @AhmetB-Google I haven't been able to find anything online saying it waits on all pids - had to `strace` to figure out what was going on. But, yeah, it still totally shouldn't be used instead of a proper `init` (also OMGHI!SUCHAHUGEFAN!!!) – Tammer Saleh Dec 14 '20 at 22:05
  • @Ahmet : While bash seems to be, at least, overkill to be a container PID1 process. Because it’s big (en so slow) for no reason. I can’t understand why it’s not a good choice. It does what it’s important : reap lasting zombies. And for signal forwarding, I can’t see what you can’t do using "trap" and "kill". Also, you often need to do some special actions from within the container at its first start, like initializing a database or a persistent volume, or exchanging an SSH key, or whatever. Having an entrypoint which is a bash script is handy for all those reasons. – Stéphane Feb 17 '21 at 22:59
  • But I guess the container’s way to do things is to have a complex setup of many simples containers (using Config Maps, Secrets, etc…). And not one fat container with all its logic inside an entrypoint.sh with complex signal handling… – Stéphane Feb 17 '21 at 23:03
0

To test if your applications are leaving zombies, you will need to assure that bash is not PID 1, rather it's the first child of PID 1.

On another question How to reap zombie process in docker container with bash I had shown example how to create a container with bash that will ignore the zombie processes by becoming PID 1 and executing bash as a child. Here is c code that can be used to generate the container:

#include <stdlib.h>

int main() {
    int status;
    status  = system("/bin/bash");
}

The code that generates the zombie and the dockerfile for the container can be found in the github repository

After compiling the module in an image, all you need to do is to start the container with docker run -ti --rm image /zombie/ignore and you will get the bash as a first child. To see this working in practice, check the link to the other question.

root@1bd66ac87f0a:/zombie# ps -eaf --forest
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 11:17 pts/0    00:00:00 /zombie/ignore
root           7       1  0 11:17 pts/0    00:00:00 sh -c /bin/bash
jordanvrtanoski
  • 5,104
  • 1
  • 20
  • 29