2

I am using a pool of processes. This logic seems to have been working until recently.

pool = multiprocessing.Pool(processes = 2*multiprocessing.cpu_count())
for p in processArgs:
    pool.apply_async(myFunc, p)

pool.close()
pool.join()

Now it seems when this is run, I end up with all child processes in a defunct state and the parent process appears not to have performed the pool.join() yet (based a) on the fact that the defunct processes are still there and b) on the fact that logic that occurs next appears not to have executed.

The defunct processes are zombies, right? This means they have died but haven't been reaped, correct?

vmayer
  • 985
  • 2
  • 9
  • 18
  • [Does `ps -el` say they are zombies?](https://www.debian-administration.org/article/261/Finding_zombie_processes) Are you making sure that the pool has finished executing for all items? – Colonel Thirty Two Oct 23 '15 at 13:03
  • It does show they are zombies (the Z is present). – vmayer Oct 23 '15 at 13:07
  • Do I need to be making sure of exit before doing the pool.close() or pool.join()? To be honest, I copied this logic from an example and presumed the pool would've finished for all items before allowing the close. – vmayer Oct 23 '15 at 13:09
  • @vmayer Just to clarify, are you saying that in your real code, whatever comes after `pool.join()` is never running? – dano Oct 23 '15 at 13:20
  • @dano: yes, it seems so. I have logic that writes a file, and that file doesn't get written in any case. – vmayer Oct 23 '15 at 13:22
  • @vmayer Have you confirmed that `pool.join()` itself is hanging? Can you created a minimal example that actually reproduces that behavior and edit it into your question? – dano Oct 23 '15 at 13:23
  • Okay, it looks like what is likely happening is a call to sys.exit(1) from one of the processes. Does this make the process un-reapable? – vmayer Oct 23 '15 at 17:58
  • Apparently, os._exit() is the right way to exit from a child process: http://stackoverflow.com/questions/19747371/python-exit-commands-why-so-many-and-when-should-each-be-used – vmayer Oct 24 '15 at 11:41

1 Answers1

0

Although I don't understand it completely, the problem was that I was making a call to sys.exit() from one of my child processes. This is not the right way to exit from a child process, and os._exit() should be called instead: stackoverflow.com/questions/19747371

sys.exit() does a bunch of cleanup first before exiting: perhaps it was stuck attempting to perform some of this cleanup, and the parent process was waiting to join it.

Feel free to provide additional explanation, and I can edit this answer.

vmayer
  • 985
  • 2
  • 9
  • 18