3

I am starting a process using execv and letting it write to a file. I start a thread simultaneously that monitors the file so that it's size does not exceed a certain limit using stat.st_size. Now, when the limit is hit, I waitpid for the child process, but this throws an error and the process I start in the background becomes a zombie. When I do the stop using the same waitpid from the main thread, the process is killed without becoming a zombie. Any ideas?

Edit: The errno is 10 and waitpid returns -1. This is on a linux platform.

abligh
  • 24,573
  • 4
  • 47
  • 84
user1295872
  • 461
  • 1
  • 6
  • 16
  • "but this throws an error ..." and that error would be... what? include error codes and all related messaging *verbatim* in you posted question. May as well also include the platform info. – WhozCraig May 04 '15 at 14:35
  • waitpid returned -1 with errno set to 10. The errno seems to indicate that the child process does not exist. But that does not seem to be the case since I am able to see the process with a ps ax. The OS is linux. – user1295872 May 04 '15 at 14:40
  • And that would be in a *comment*. That info belongs *in your posted question.* Regardless, perhaps the section on "Linux Notes" [in the documentation of `waitpid`](http://linux.die.net/man/2/waitpid) may be related. – WhozCraig May 04 '15 at 14:54
  • Thanks for pointing that out. So if I use the option _WALL, I should be able to wait on child processes created by the main thread. I edited the question to include the error information. – user1295872 May 04 '15 at 15:10
  • A quick look at /usr/include/asm-generic/errno-base.h show 10 is ECHILD (no child process). (You can convert errno to a string with strerror_r(3)). Look at the waitpid(2) man page for more information. – RTLinuxSW May 04 '15 at 17:00
  • Also please note: "The exec() family of functions replaces the current process image with a new process image. " How are you starting the child process exactly? – hookenz May 04 '15 at 20:11

1 Answers1

2

This is difficult to debug without code, but errno 10 is ECHILD.

Per the man page, this is returned as follows:

ECHILD (for waitpid() or waitid()) The process specified by pid (waitpid()) or idtype and id (waitid()) does not exist or is not a child of the calling process. (This can happen for one's own child if the action for SIGCHLD is set to SIG_IGN. See also the Linux Notes section about threads.)

In short, the pid you are specifying is not a child of the process calling waitpid() (or is no longer, perhaps because it has terminated).

Note the parenthetical section:

  • "This can happen for one's own child if the action for SIGCHLD is set to SIG_IGN" - if you've set up a signal handler for SIGCHLD to be SIG_IGN, the wait is effectively done automatically, and therefore waitpid won't work as the child will have already terminated (will not go through zombie state).

  • "See also the Linux Notes section about threads." - In Linux, threads are essentially processes. Modern linux will allow one thread to wait for children of other threads (provided they are in the same thread group - broadly parent process). If you are using Linux prior to 2.4, this is not the case. See the documentation on __WNOTHREAD for details.

I'm guessing the thread thing is a red herring, and the problem is actually the signal handler, as this accords with your statement 'the process is killed without becoming a zombie.'

abligh
  • 24,573
  • 4
  • 47
  • 84
  • I am not setting up a signal handler for SICHLD. So that should not be the issue. I am using a 2.6 kernel which should implicitly allow my thread to wait on the process created from the main thread. But that is not happening. I set the __WALL and __WCLONE options for waitpid to no avail. I will keep probing and get back to you. Thanks for the suggestions though. – user1295872 May 05 '15 at 05:22