Limits of Python multiprocessing.Pool

Question

I am running a time consuming program a lot of times. I have the chance to have access to a cluster where I can require 504 processors, but customer service is let's say slow, so I turn to you SO. I am using a very simple application as follow:

import multiprocessing

def function(data):
    data = complicated_function_I_was_given(data)
    with open('unique_id', 'w') as f:
        f.write(data)

pool = multiprocessing.Pool(504)
pool.map(function, data_iterator)

Now, although I can see the processes start (the 'complicated_function_I_was_given' writes a bunch of scrap, but with unique names so I am sure there is no clash), the process seems really slow. I am expecting some data in data_iterator to be processed immediately, although some will take days, yet after 1 day nothing has been produced. Could it be that multiprocessing.Pool() has a limit? Or that it doesn't distributes the processes over different nodes (I know each node has 12 cores)? And I am using python2.6.5.

score 4 · Accepted Answer · edited May 23 '17 at 12:12

4

Or that it doesn't distributes the processes over different nodes (I know each node has 12 cores)? And I am using python2.6.5.

I think this is your problem: unless your cluster architecture is very unusual, and all the processors appear to be on the same logical machine, then multiprocessing will only have access to the local cores. You probably need to use a different parallelisation library.

See also the answers to this question.

edited May 23 '17 at 12:12

Community

1
1

answered Feb 26 '12 at 22:38

James

24,676
13
84
130

Thanks for the link, I think you are right. I don't know how I could have miss that question! Now to play with mpi4py than. – Zenon Feb 27 '12 at 00:09

score 1 · Answer 2 · answered Feb 26 '12 at 08:38

1

You might try scaling the work with one of Python's many parallel libraries, I've not heard of scaling work over so many processors with just multiprocessing.

answered Feb 26 '12 at 08:38

bluemoon

111
3

Limits of Python multiprocessing.Pool

2 Answers2