I'm not sure if this title is appropriate for my situation: the reason why I want to share numpy array is that it might be one of the potential solutions to my case, but if you have other solutions that would also be nice.
My task: I need to implement an iterative algorithm with multiprocessing, while each of these processes need to have a copy of data(this data is large, and read-only, and won't change during the iterative algorithm).
I've written some pseudo code to demonstrate my idea:
import multiprocessing
def worker_func(data, args):
# do sth...
return res
def compute(data, process_num, niter):
data
result = []
args = init()
for iter in range(niter):
args_chunk = split_args(args, process_num)
pool = multiprocessing.Pool()
for i in range(process_num):
result.append(pool.apply_async(worker_func,(data, args_chunk[i])))
pool.close()
pool.join()
# aggregate result and update args
for res in result:
args = update_args(res.get())
if __name__ == "__main__":
compute(data, 4, 100)
The problem is in each iteration, I have to pass the data to subprocess, which is very time-consuming.
I've come up with two potential solutions:
- share data among processes (it's ndarray), that's the title of this question.
- Keep subprocess alive, like a daemon process or something...and wait for call. By doing that, I only need to pass the data at the very beginning.
So, is there any way to share a read-only numpy array among process? Or if you have a good implementation of solution 2, it also works.
Thanks in advance.