What I want to do
I have an m x n numpy array A where m << n that I want to load on a node where all 20 CPUs on that node can share memory. On each CPU, I want to multiply A by a n x 1 vector v, where the vector v differs on each CPU but matrix A stays the same.
Constraint
Matrix A is sufficiently large so that I cannot load A on each CPU, so I would like to put A in shared node memory. And since A*v is just m x 1, I think I never need to store matrices of size m x n on each CPU (just one copy of A in shared memory).
The references below tell me I can use shared memory with the multiprocessing module.
My question
If I have 1 worker per CPU, can each worker simultaneously compute A x v (where v is different for each worker) using the multiprocessing module?
I am concerned that since I am simultaneously accessing the same shared memory by each worker, multiprocessing would copy matrix A into each CPU, which would cause memory problems for me.
Edit:
Question has been edited to remove the part about ray.io because I just learned Ray has its own forum so I am asking the same question except for Ray on that forum.
References