8

I find many similar questions but no answer. For simple array there is multiprocessing.Array. For sparse matrix or any other arbitrary object I find manager.namespace. So I tried the code below:

from scipy import sparse
from multiprocessing import Pool
import multiprocessing
import functools

def myfunc(x,ns):
    return ns.A[x,:]*ns.A*ns.A[:,x]

manager = multiprocessing.Manager()
Global = manager.Namespace()
pool=Pool()
Global.A=sparse.rand(10000,10000,0.5,'csr')
myfunc2=functools.partial(myfunc,ns=Global)
r=pool.map(myfunc2, range(100))

The code works but not efficient. Only 4 of 16 workers are actually working. The reason is that, I guess, the manager allows only one worker to access the data at a time. Since the data is read only I don't really need a lock. So is there a more efficient way to do this?

p.s., I have seen people talking about copy-on-write fork(). I don't really understand what it is but it does not work. If I generate A first and do Pool(), each process would have a copy of A.

Thank you in advance.

user2727768
  • 666
  • 1
  • 7
  • 12
  • 1
    You might want to try using [sharedmem](https://bitbucket.org/cleemesser/numpy-sharedmem/overview) instead of `multiprocessing.Manager`. Out of curiosity -- what OS are you using? – unutbu Nov 05 '13 at 01:30
  • @unutbu thank you. I am going to take a look at sharedmem. I am running it on a linux vm on a cluster – user2727768 Nov 05 '13 at 02:39

1 Answers1

0

A property of a Namespace object is only updated when is it explicitly assigned to. Good explanations are given here.

Edit: And looking at the implementation (in multiprocessing/managers.py), it does not seem to use shared memory. It just pickles objects and sends them to the child when requested. That is probably why it is taking so long.

Are you by any chance creating a pool with more workers than your CPU has cores? (I.e. using the processes argument of the Pool constructor.) This is generally not a good idea.

There are a couple of other things you can try;

  • Write the sparse matrix to a file, and let each worker process read the file. The OS will likely put the file in its buffer cache, so the performance of this might be a lot better than you think.
  • A possible improvement is to use a memory mapped file using the mmap module.
Community
  • 1
  • 1
Roland Smith
  • 42,427
  • 3
  • 64
  • 94
  • Thank you. The cores are equal to the number of workers. Is it because that all workers try to access the shared matrix at the same time and only one get the access? I do not know if the manager has a lock. Maybe I should try mmap. – user2727768 Nov 05 '13 at 02:46