11

I'm sorry if this is too simple for some people, but I still don't get the trick with python's multiprocessing. I've read
http://docs.python.org/dev/library/multiprocessing
http://pymotw.com/2/multiprocessing/basics.html and many other tutorials and examples that google gives me... many of them from here too.

Well, my situation is that I have to compute many numpy matrices and I need to store them in a single numpy matrix afterwards. Let's say I want to use 20 cores (or that I can use 20 cores) but I haven't managed to successfully use the pool resource since it keeps the processes alive till the pool "dies". So I thought on doing something like this:

from multiprocessing import Process, Queue  
import numpy as np  

def f(q,i):  
     q.put( np.zeros( (4,4) ) ) 

if __name__ == '__main__':   
     q = Queue()   
     for i in range(30):   
          p = Process(target=f, args=(q,))  
          p.start()  
          p.join()  
     result = q.get()  
     while q.empty() == False:
          result += q.get()  
     print result

but then it looks like the processes don't run in parallel but they run sequentially (please correct me if I'm wrong) and I don't know if they die after they do their computation (so for more than 20 processes the ones that did their part leave the core free for another process). Plus, for a very large number (let's say 100.000), storing all those matrices (which may be really big too) in a queue will use a lot of memory, rendering the code useless since the idea is to put every result on each iteration in the final result, like using a lock (and its acquire() and release() methods), but if this code isn't for parallel processing, the lock is useless too...

I hope somebody may help me.

Thanks in advance!

twasbrillig
  • 17,084
  • 9
  • 43
  • 67
Carlos
  • 113
  • 1
  • 1
  • 5

1 Answers1

14

You are correct, they are executing sequentially in your example.

p.join() causes the current thread to block until it is finished executing. You'll either want to join your processes individually outside of your for loop (e.g., by storing them in a list and then iterating over it) or use something like numpy.Pool and apply_async with a callback. That will also let you add it to your results directly rather than keeping the objects around.

For example:

def f(i):  
    return i*np.identity(4)

if __name__ == '__main__':
    p=Pool(5)
    result = np.zeros((4,4))
    def adder(value):
        global result
        result += value

    for i in range(30):
        p.apply_async(f, args=(i,), callback=adder)
    p.close()
    p.join()
    print result

Closing and then joining the pool at the end ensures that the pool's processes have completed and the result object is finished being computed. You could also investigate using Pool.imap as a solution to your problem. That particular solution would look something like this:

if __name__ == '__main__':
    p=Pool(5)
    result = np.zeros((4,4))

    im = p.imap_unordered(f, range(30), chunksize=5)

    for x in im:
        result += x

    print result

This is cleaner for your specific situation, but may not be for whatever you are ultimately trying to do.

As to storing all of your varied results, if I understand your question, you can just add it off into a result in the callback method (as above) or item-at-a-time using imap/imap_unordered (which still stores the results, but you'll clear it as it builds). Then it doesn't need to be stored for longer than it takes to add to the result.

David H. Clements
  • 3,590
  • 2
  • 24
  • 26
  • Thanks for your answer! I understand the first solution more, and I find the callback extremely helpful since the imap_unordered seems to store all the results, and that's what I wouldn't like to do in order not to eat memory. As for the pool, I'm not sure (because of what I read about the maxtasksperchild attribute) that if I have "x" processors, "3x" processes will run since the "x" first processes don't die. I'm also not sure if the memory allocated for the first "x" processes is free after the callback. I ask in order not to just "block" my pc when using many more and bigger matrices – Carlos Jan 06 '12 at 10:35
  • Oh! I think now I get it: The workers are alive as long as the pool is alive, but as soon as they finish one process they free the resources and then take the next process and do the computation... Is that it? – Carlos Jan 06 '12 at 12:38
  • Yup, that's about it. I wouldn't worry too much about `Pool` or finding a substitute unless you actually have profiling data indicating that it is a problem. There are optimizations you can do, but until you have demonstration that there is a problem in your real system most of them aren't worth the trouble. – David H. Clements Jan 06 '12 at 16:28
  • I just noticed that the processes are like a duplicate of their parent, using the same amount of memory. When working with a lot of data (images as matrices or vectors) this is a real trouble, so I guess I should work with threads. Maybe you have an idea on how to do the same as before but with threads? I'll anyway read the documentation and examples. Thanks again David. – Carlos Jan 06 '12 at 18:53
  • No worries, hope it all helps ^_^ There are a few different solutions to the memory problem. The multiprocessing module is mostly for dealing with CPU-bound problems, while threading works best for IO bound problems where IO might block. I'd suggest getting everything set up using this method and then seeing if it is something you need to optimize around, then opening up another question to get a little more feedback on it. – David H. Clements Jan 06 '12 at 20:08