I have a situation similar to this question:
Share Large, Read-Only Numpy Array Between Multiprocessing Processes
Except for one significant difference that I definitely would like the entire array to live in RAM. To restate though for clarity:
I would like to share a fairly large numpy array between multiple processes, read-only, and keep the entire array in RAM to get the absolute best performance. A linux-only solution is fine. I'd also like this to work in a production environment, so would rather avoid bringing dependencies on research-oriented packages, or doing anything hacky.
For this scenario, it seems to me that the numpy-sharedmem
style approach is overkill. The approaches that still stand out are:
Suggested in the other thread here, would be to keep the array as a global variable and simply
fork()
. This seems like it would be as fast as possible. Would multiple processes end up competing over the shared memory pages in any way, or interfering with each other's caching in some way which would introduce some overhead versus a single-process scenario?I'm leery of this method though due to comments like this one. It also may be inconvenient to try to use
fork()
in my multiprocessing environment (I am most likely going with Twisted at this point).While numpy's built-in memory mapping was brought up in the other thread, it was apparently geared toward arrays bigger than main-memory. I don't believe I've seen the following possibility discussed on stackoverflow: why not just place the npy file into a ramdisk and mmap it (
mmap_mode='r'
) for easy and stable read-only in-memory shared numpy arrays?What are the performance considerations here? Is it much different than the
fork()
method, or a true shared-memory approach likenumpy-sharedmem
for example? How much overhead is incurred by the mmap layer in numpy? When the npy file is placed on the ramdisk does contiguity matter much? Is cache locality affected? Is there going to be any contention between processes?
I am leaning towards option 2 for the stability factor alone, but would like to understand the possible performance differences, and to understand why mmap+ramdisk might be a quick and easy solution for many applications similar or not so similar to mine.