I'm using python 2.7 and numpy on a linux machine.
I am running a program which involves a time-consuming function computeGP(level, grid)
which takes input in form of a numpy array level
and an object grid
, which is not is not modified by this function.
My goal is to parallelize computeGP
(locally, so on different cores) for different level
but the same grid
. Since grid
stays invariant, this can be done without synchronization hassle using shared memory. I've read a bit about threading in python and the GIL and it seems to me that i should go with the multiprocessing
module rather than threading. This and this answers recommend to use multiprocessing.Array
to share efficiently, while noting that on unix machines it is default behaviour that the object is not copied.
My problem is that the object grid
is not a numpy array.
It is a list of numpy arrays, because the way my data structure works is that i need to access array (listelement) N and then access its row K.
Basically the list just fakes pointers to the arrays.
So my questions are:
- My understanding is, that on unix machines i can share the object
grid
without any further usage ofmultiprocessing
datatypesArray
(orValue
). Is that correct? - Is there a better way to
implement this pointer-to-array datastructure which can use the more
efficient
multiprocessing.Array
?
I don't want to assemble one large array containing the smaller ones from the list, because the smaller ones are not really small either...
Any thoughts welcome!