I have a 60GB SciPy Array (Matrix) I must share between 5+ multiprocessing
Process
objects. I've seen numpy-sharedmem and read this discussion on the SciPy list. There seem to be two approaches--numpy-sharedmem
and using a multiprocessing.RawArray()
and mapping NumPy dtype
s to ctype
s. Now, numpy-sharedmem
seems to be the way to go, but I've yet to see a good reference example. I don't need any kind of locks, since the array (actually a matrix) will be read-only. Now, due to its size, I'd like to avoid a copy. It sounds like the correct method is to create the only copy of the array as a sharedmem
array, and then pass it to the Process
objects? A couple of specific questions:
What's the best way to actually pass the sharedmem handles to sub-
Process()
es? Do I need a queue just to pass one array around? Would a pipe be better? Can I just pass it as an argument to theProcess()
subclass's init (where I'm assuming it's pickled)?In the discussion I linked above, there's mention of
numpy-sharedmem
not being 64bit-safe? I'm definitely using some structures that aren't 32-bit addressable.Are there tradeoff's to the
RawArray()
approach? Slower, buggier?Do I need any ctype-to-dtype mapping for the numpy-sharedmem method?
Does anyone have an example of some OpenSource code doing this? I'm a very hands-on learned and it's hard to get this working without any kind of good example to look at.
If there's any additional info I can provide to help clarify this for others, please comment and I'll add. Thanks!
This needs to run on Ubuntu Linux and Maybe Mac OS, but portability isn't a huge concern.