Python multiprocessing, using pool multiple times in a loop gets stuck after first iteration

Question

I have the following situation, where I create a pool in a for loop as follows (I know it's not very elegant, but I have to do this for pickling reasons). Assume that the pathos.multiprocessing is equivalent to python's multiprocessing library (as it is up to some details, that are not relevant for this problem). I have the following code I want to execute:

self.pool = pathos.multiprocessing.ProcessingPool(number_processes)


for i in range(5):


    all_responses = self.pool.map(wrapper_singlerun, range(self.no_of_restarts))

    pool._clear()

Now my problem: The loop successfully runs the first iteration. However, at the second iteration, the algorithm suddenly stops (Does not finish the pool.map operation. I suspected that zombie processes are generated, or that the process was somehow switched. Below you will find everything I have tried so far.

for i in range(5):

    pool = pathos.multiprocessing.ProcessingPool(number_processes)

    all_responses = self.pool.map(wrapper_singlerun, range(self.no_of_restarts))

    pool._clear()

    gc.collect()

    for p in multiprocessing.active_children():
        p.terminate()
        gc.collect()

    print("We have so many active children: ", multiprocessing.active_children()) # Returns []

The above code works perfectly well on my mac. However, when I upload it on the cluster that has the following specs, I get the error that it gets stuck after the first iteration:

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04 LTS"

This is the link to the pathos' multiprocessing library file is

Hi I'm the `pathos` author. `pathos.multiprocessing` provides an enhanced `Pool` in two ways: (1) better serialization, and (2) initialization. If you are just looking for the better serialization, I suggest you use `pathos.pools._ProcessPool`, which has the exact interface and specifications of `multiprocessing`, but with better serialization. If you are looking for other of the `pathos` features, then you should know that `pathos` pools are kept unless explicitly destroyed. You have to do a `pool._clear` or `pool.restart` on the pool you are using above to kill (or restart) the pool. — Mike McKerns, Jul 06 '18 at 13:28
1.) Thanks for the great library! :) 2.), I tried `pool._clear()` in the above example (I put it right before `pool = None`, and kept everything else the same). However, it tells me that `._clear()` is not a member function of `pathos.multiprocessing.Pool`. I also tried to call `._clear()` on `pathos.pools._ProcessPool`, leading to the same error — DaveTheAl, Jul 06 '18 at 15:19
Turns out I had to use `pathos.multiprocessing.ProcessingPool`. However, this still does not resolve the problem on the ubuntu machine :/ — DaveTheAl, Jul 06 '18 at 19:21
The interface in `pathos.multiprocessing` is deprecated... the preferred interface is `pathos.pools.ProcessPool`. That pool should have a `_clear` and a `restart` method. Note this is the same object as `pathos.multiprocessing.ProcessPool` and `pathos.multiprocessing.ProcessingPool`... both of which I've left hanging around for backward compatibility. — Mike McKerns, Jul 06 '18 at 20:06
It's hard to debug what you are seeing as you haven't provided code that other people can run, where the error you are seeing is demonstrated. Can you do that? As is, it's hard to tell what you are trying to do. — Mike McKerns, Jul 06 '18 at 20:11
Unfortunately, the code is quite complex and a few hundred lines, the above lines are the essence. I have worked a little more on the issue, and it seems like it must be an issue with the operating system. I suspect that the issue lies in the following: The cluster I use has some kind of lock, that denies the process to spawn too many processes at once. I am not quite sure how I can work around that thought... :/ — DaveTheAl, Jul 06 '18 at 20:15
If that is the case, you should be able to test the theory by limiting the number of processes in your Pool, coupled with doing a `clear` to shut down each `Pool` in your loop. That will limit the number of active processes. You could try `ProcessPool(1)` or something like that. — Mike McKerns, Jul 06 '18 at 20:36
@MikeMcKerns I did this, and indeed, the problem still persists. Any idea what I could do now? — DaveTheAl, Jul 09 '18 at 20:26
I'd watch a memory monitor for memory leakage, and also try to independently serialize the objects (directly with `dill`). That should tell you whether it's a serialization issue, memory issue, or a memory issue from serialization. — Mike McKerns, Jul 09 '18 at 20:55
the process can be destroyed from within itself. you need to check the data and change the flags. You must check the results on each successive run and proceed to the next step. The termination (conditional) can not be made from the process number (internal flags are used for this). — dsgdfg, Jul 16 '18 at 08:11
Error logging and process counters are always used in processes. Consecutive calls do not increase performance, only the execution priority of your process. I think friends who wrote the module missed this point. — dsgdfg, Jul 16 '18 at 08:15
@DaveTheAl did you ever resolve the issue with the hang? I believe I've run into the same issue. — wmcnally, Jul 08 '20 at 19:21
@wmcnally unfortunately not :/ I remember I just switched the computing machine, it was dependent on that. I can imagine that this is some issue with the python versions (python3.6 v 3.7, v 3.x ...). Apart from that, not many ideas.. — DaveTheAl, Jul 08 '20 at 23:03
@DaveTheAl I resolved using the multiprocessing package: `from multiprocessing import get_context`. In your for loop use: `with get_context("spawn").Pool() as p` — wmcnally, Aug 18 '20 at 15:13

score 1 · Answer 1 · answered Jul 06 '18 at 12:39

1

I am assuming that you are trying to call this via some function which is not the correct way to use this.

You need to wrap it around with :

if __name__ == '__main__':
    for i in range(5):

         pool = pathos.multiprocessing.Pool(number_processes)

         all_responses = pool.map(wrapper_singlerun, 

range(self.no_of_restarts))

If you don't do it will keep on creating a copy of itself and will start putting it into stack which will ultimately fill the stack and block everything. The reason it works on mac is that it has fork while windows does not have it.

answered Jul 06 '18 at 12:39

dilkash

562
3
15

1.) so the difference is __name__ == "__main__"? unfortunately it is not something I can apply, as my script is merely a module for another library that controls how my functions are executed. 2.) the other platform on which is does not run is not windows, but ubuntu (as marked at the very end) – DaveTheAl Jul 06 '18 at 15:16
Your main script, the one you're calling first, should be wrapped inside an `if name == 'main'` clause. – Raf Jul 16 '18 at 08:22

Python multiprocessing, using pool multiple times in a loop gets stuck after first iteration

1 Answers1