I find many similar questions but no answer. For simple array there is multiprocessing.Array. For sparse matrix or any other arbitrary object I find manager.namespace. So I tried the code below:
from scipy import sparse
from multiprocessing import Pool
import multiprocessing
import functools
def myfunc(x,ns):
return ns.A[x,:]*ns.A*ns.A[:,x]
manager = multiprocessing.Manager()
Global = manager.Namespace()
pool=Pool()
Global.A=sparse.rand(10000,10000,0.5,'csr')
myfunc2=functools.partial(myfunc,ns=Global)
r=pool.map(myfunc2, range(100))
The code works but not efficient. Only 4 of 16 workers are actually working. The reason is that, I guess, the manager allows only one worker to access the data at a time. Since the data is read only I don't really need a lock. So is there a more efficient way to do this?
p.s., I have seen people talking about copy-on-write fork(). I don't really understand what it is but it does not work. If I generate A first and do Pool(), each process would have a copy of A.
Thank you in advance.