2

I have a while loop that reads data from a child process using blocking I/O by redirecting stdout of the child process to the parent process. Normally, as soon as the child process exits, a blocking read() in this case will return since the pipe that is read from is closed by the child process.

Now I have a case where the read() call does not exit for a child process that finishes. The child process ends up in a zombie state, since the operating system is waiting for my code to reap it, but instead my code is blocking on the read() call.

The child process itself does not have any child processes running at the time of the hang, and I do not see any file descriptors listed when looking in /proc/<child process PID>/fd. The child process did however fork two daemon processes, whose purpose seems to be to monitor the child process (the child process is a proprietary application I do not have any control over, so it is hard to say for sure).

When run from a terminal, the child process I try to read() from exits automatically, and in turn the daemon processes it forked terminate as well.

Linux version is 4.19.2.

What could be the reason of read() not returning in this case?

Follow-up: How to avoid read() from hanging in the following situation?

Ton van den Heuvel
  • 10,157
  • 6
  • 43
  • 82
  • Why not closing the pipe on the child process when killed, using signal handler? – Bumsik Kim Nov 30 '18 at 10:55
  • @BumsikKim No process is being killed here, the child process should simply exit. I will edit the question to clarify that. – Ton van den Heuvel Nov 30 '18 at 10:57
  • 2
    Setup up a signal handler for `SIGCHLD`, and reap the child in there by calling `wait()` or `waitpid()`. – alk Nov 30 '18 at 11:00
  • @alk, thanks for the pointer, I'll look into it. Still, I do not understand why the `read()` will not simply return after the child process exits. – Ton van den Heuvel Nov 30 '18 at 11:02
  • "*after the child exits*" it didn't (completely). As you mention, it's still in zombie state. – alk Nov 30 '18 at 11:03
  • @alk, ok, I see. By the way, I am redirecting stdout of the child process (that is the pipe I read from). So my question really is why is the write end of that pipe on the child side not closed automatically before it enters a zombie state? – Ton van den Heuvel Nov 30 '18 at 11:06
  • A zombie process has already made an `exit(2)` syscall, so it can be doing nothing at all. They have all the resources deallocated and only fill a slot in the process table to verify the correctness of calling `wait(2)` in the parent process. – Luis Colorado Dec 05 '18 at 09:38

2 Answers2

3

The child process did however fork two daemon processes ... What could be the reason of read() not returning in this case?

Forked processes still have the file descriptor open when the child terminates. Hence read call never returns 0.

Those daemon processes should close all file descriptors and open files for logging.

Maxim Egorushkin
  • 131,725
  • 17
  • 180
  • 271
  • Aaah, of course...Unfortunately I do not control the daemon processes, so I'll have to go the `SIGCHLD` route. – Ton van den Heuvel Nov 30 '18 at 11:13
  • 1
    @TonvandenHeuvel If you control the code invoking `fork` then that is the place to close file descriptors - right after `fork`. – Maxim Egorushkin Nov 30 '18 at 11:23
  • Yes I know that, but in this case I do not want to close standard out of the child process, I want to read from it. – Ton van den Heuvel Nov 30 '18 at 11:33
  • @TonvandenHeuvel Your child process forks and now you have two process. You only need to close the file descriptors in the child of the child. They remain open in the child. – Maxim Egorushkin Nov 30 '18 at 11:41
  • I'm confused. With the file descriptors in the child of the child you mean the file descriptors in the daemon processes forked by the child process? Unfortunately, I have no control over those processes. What I now do is create a pipe for the output redirection (with `O_CLOEXEC` set on the file descriptors), then close the write-end in the parent process, and then do `dup2(read-end-fd, stdout-fd)` in the child process right before `execvp()`. That should do it I think since `O_CLOEXEC` should take care of closing the pipe file descriptors. (1/2) – Ton van den Heuvel Nov 30 '18 at 13:57
  • Somehow it should be possible to control the file descriptors in the daemon processes as well, since I can do `child_process | tee output.log` in the terminal, and it exits nicely... (2/2) – Ton van den Heuvel Nov 30 '18 at 14:00
  • @TonvandenHeuvel: You might like to have a look at `O_CLOEXEC` here: http://man7.org/linux/man-pages/man2/open.2.html – alk Nov 30 '18 at 14:59
  • @TonvandenHeuvel You say that the child processes fork daemons. You need to close the file descriptors in the daemon after `fork` but before `exec`. – Maxim Egorushkin Nov 30 '18 at 15:20
  • @alk, thank you, I am already using `O_CLOEXEC`. I understand the problem better now; if I do not redirect output, the child process exits nicely. If I redirect output to the parent process the child process does not exit. The file descriptors of the pipe that is responsible for redirecting stdout to the parent are created with `O_CLOEXEC` flags, and then `stdout` in the child process is redirected to the write end of the pipe using `dup2()`. If I use `dup3()` instead with `O_CLOEXEC` the child process exits too, but I don't see any output in the parent of course. – Ton van den Heuvel Nov 30 '18 at 16:07
  • 1
    @MaximEgorushkin, to clarify; I have no control over the child process and the daemons it forks. I only have control over the parent application. – Ton van den Heuvel Nov 30 '18 at 16:18
  • @TonvandenHeuvel You have control over `fork` since you can override it with your own version. But once `exec` is called you have no control. – Maxim Egorushkin Sep 19 '19 at 22:07
  • @MaximEgorushkin, you are suggesting overriding fork by preloading a shared library with a custom fork() implementation before starting the child process...? To clarify, this child process I am forking is a program I do not have any source code for, only a binary. In any case, the problem has been resolved since then, see the follow up question. – Ton van den Heuvel Sep 20 '19 at 05:03
1

A possible reason (the most common) for read(2) blocking on a pipe with a dead child, is that the parent has not closed the writing side of the pipe, so there's still an open (for writing) descriptor for that pipe. Close the writing side of the pipe in the parent process before reading from it. The child is dead (you said zombie) so it cannot be the process with the writing side of the pipe open. And don't forget to wait(2) for the child in the parent, or you'll get a system full of zombies :)

Remember, you have to do two closes in your code:

  • One in the parent process, to close the writing side of the pipe, leaving the parent process with only a reading descriptor.

  • One in the child process (just before exec(2)ing) closing the reading side of the pipe, leaving the child process only with a writing descriptor.

In case you want to use the pipe(2) to send information to the child, change the reading for writing and viceversa in the above two points.

Luis Colorado
  • 10,974
  • 1
  • 16
  • 31
  • Thank you for the suggestions, those actions were taken in the parent process though. The reading side was not closed in the child process, since `O_CLOEXEC` was set on the file descriptor this is not needed. The reason the parent was blocking is that the daemon processes forked by the child processes still had file descriptors open to the pipe used for redirecting STDOUT from the child process to the parent process. See also the follow-up question link in this question. – Ton van den Heuvel Dec 05 '18 at 10:07
  • Well, that can be another possibility, I don't remember to have read that you where building deep hierarchies of processes.... Indeed I was aware of a parent and a zombie child only. A zombie can do nothing on a system. It's a dead process. It's never executed, has no resources attached to it, being file descriptors or any other thing. – Luis Colorado Dec 05 '18 at 10:10
  • it's right there in the question: "The child process did however fork two daemon processes, whose purpose seems to be to monitor the child process (the child process is a proprietary application I do not have any control over, so it is hard to say for sure)." – Ton van den Heuvel Dec 06 '18 at 09:33