How to use custom data structure with multiprocessing in python

Question

I'm using python 2.7 and numpy on a linux machine. I am running a program which involves a time-consuming function computeGP(level, grid) which takes input in form of a numpy array level and an object grid, which is not is not modified by this function.

My goal is to parallelize computeGP (locally, so on different cores) for different level but the same grid. Since grid stays invariant, this can be done without synchronization hassle using shared memory. I've read a bit about threading in python and the GIL and it seems to me that i should go with the multiprocessing module rather than threading. This and this answers recommend to use multiprocessing.Array to share efficiently, while noting that on unix machines it is default behaviour that the object is not copied.

My problem is that the object grid is not a numpy array. It is a list of numpy arrays, because the way my data structure works is that i need to access array (listelement) N and then access its row K. Basically the list just fakes pointers to the arrays.

So my questions are:

My understanding is, that on unix machines i can share the object grid without any further usage of multiprocessing datatypes Array (or Value). Is that correct?
Is there a better way to implement this pointer-to-array datastructure which can use the more efficient multiprocessing.Array?

I don't want to assemble one large array containing the smaller ones from the list, because the smaller ones are not really small either...

Any thoughts welcome!

score 0 · Answer 1 · answered Jun 17 '15 at 17:13

This SO question is very similar to yours: Share Large, Read-Only Numpy Array Between Multiprocessing Processes

There are a few answers in there, but the simplest if you are only using linux is to just make the data structure a global variable. Linux will fork() the process, which gives all worker processes copy-on-write access to the main process' memory (globals).

In this case you don't need to use any special multiprocessing classes or pass any data to the worker processes except level.

How to use custom data structure with multiprocessing in python

1 Answers1