I am running a time consuming program a lot of times. I have the chance to have access to a cluster where I can require 504 processors, but customer service is let's say slow, so I turn to you SO. I am using a very simple application as follow:
import multiprocessing
def function(data):
data = complicated_function_I_was_given(data)
with open('unique_id', 'w') as f:
f.write(data)
pool = multiprocessing.Pool(504)
pool.map(function, data_iterator)
Now, although I can see the processes start (the 'complicated_function_I_was_given' writes a bunch of scrap, but with unique names so I am sure there is no clash), the process seems really slow. I am expecting some data
in data_iterator
to be processed immediately, although some will take days, yet after 1 day nothing has been produced. Could it be that multiprocessing.Pool() has a limit? Or that it doesn't distributes the processes over different nodes (I know each node has 12 cores)? And I am using python2.6.5.