2

I'm pretty new to multiprocessing using Python and I'm trying to understand how to use Pool properly. I have some code which looks like this:

import numpy as np
from multiprocessing.dummy import Pool as ThreadPool
... 


pool = ThreadPool(15)

arb = np.arange(0,len(np.concatenate((P,V),axis=0)),1)

F = pool.map(tttt,arb)

pool.close()
pool.join()

NT = 1000

for test in range(0,NT):
    (P,V) = Dynamics(P,V,F)

    pool = ThreadPool(15)

    F = pool.map(tttt,arb)

    pool.close()
    pool.join() 

...

tttt and Dynamics are two functions that are previously defined. I want to use Pool to be able to calculate a lot of values simultaneosly using tttt but I also want to update the values that I use for those calculations (tttt depends on P and V although not explicitly).

Do I have to create and close the pool twice as I am doing right now or can I do it just once?

S -
  • 349
  • 3
  • 19

2 Answers2

4

Simple answer

It seems you would like to use a pool of processes on each iteration of a for loop. You've made things more complicated than you need to for using Pool.map, but your calls to .join() and .close() suggest you'd rather be using Pool.map_async. Here's a simple example:

import numpy as np
from multiprocessing import Pool
from time import sleep

def print_square(x):
    sleep(.01)
    print x**2

if __name__=='__main__':

    for k in range(10):
        pool = Pool(3)
        arb = np.arange(0,10)
        pool.map_async(print_square,arb)
        pool.close()
        pool.join()

General remarks

  1. You should generally include a minimal, complete, verifiable example. Your example couldn't be run. Worse, it contained lots of extraneous domain-specific code (e.g. P, V, Dynamics) which discourages others from trying to run your example.

  2. State what the observed behavior of your code is (e.g. wrong output, run time error, etc.) and the desired behavior.

  3. It's confusing to import Pool as ThreadPool, since threads and processes are different and yet have very similar APIs.

Community
  • 1
  • 1
rjonnal
  • 1,137
  • 7
  • 17
1

You don't have to instantiate multiple pools. You could call pool.map(...) several times before calling pool.close(). Only call pool.close() when you have no more tasks to submit to the pool.

The caveat though is that pool.map(...) is a blocking call - it doesn't return until all of the tasks submitted to the pool are complete. This could be inefficient for your purposes - while the pool is working you might want to do background work like collecting/submitting more tasks. You could instead use pool.map_async(...) but your code will get a bit more complicated.

Side note: I would be careful with your naming. A multiprocessing.Pool is not a pool of threads. It's a pool of child processes. Thread pools and process pools are different beasts with their own considerations.

In fact, you might discover that you actually would like a home-rolled thread pool if your processing is all mostly in numpy (C extension, GIL is released, not many worries about interpreter-level contention even with the heavy processing, threading = less IPC overhead and system resource requirements, etc).

Jeremy Brown
  • 17,880
  • 4
  • 35
  • 28