What is the overhead imposed by mmapping a numpy array located on a ramdisk?

Question

I have a situation similar to this question:

Share Large, Read-Only Numpy Array Between Multiprocessing Processes

Except for one significant difference that I definitely would like the entire array to live in RAM. To restate though for clarity:

I would like to share a fairly large numpy array between multiple processes, read-only, and keep the entire array in RAM to get the absolute best performance. A linux-only solution is fine. I'd also like this to work in a production environment, so would rather avoid bringing dependencies on research-oriented packages, or doing anything hacky.

For this scenario, it seems to me that the numpy-sharedmem style approach is overkill. The approaches that still stand out are:

Suggested in the other thread here, would be to keep the array as a global variable and simply fork(). This seems like it would be as fast as possible. Would multiple processes end up competing over the shared memory pages in any way, or interfering with each other's caching in some way which would introduce some overhead versus a single-process scenario?

I'm leery of this method though due to comments like this one. It also may be inconvenient to try to use fork() in my multiprocessing environment (I am most likely going with Twisted at this point).
While numpy's built-in memory mapping was brought up in the other thread, it was apparently geared toward arrays bigger than main-memory. I don't believe I've seen the following possibility discussed on stackoverflow: why not just place the npy file into a ramdisk and mmap it (mmap_mode='r') for easy and stable read-only in-memory shared numpy arrays?

What are the performance considerations here? Is it much different than the fork() method, or a true shared-memory approach like numpy-sharedmem for example? How much overhead is incurred by the mmap layer in numpy? When the npy file is placed on the ramdisk does contiguity matter much? Is cache locality affected? Is there going to be any contention between processes?

I am leaning towards option 2 for the stability factor alone, but would like to understand the possible performance differences, and to understand why mmap+ramdisk might be a quick and easy solution for many applications similar or not so similar to mine.

Shared memory on Linux is already implemented as mmap on tmpfs, so it's fine. — that other guy, Jun 16 '17 at 18:27

Timothy Baldwin · Answer 1 · 2022-09-19T09:26:41.017

0

Using a ramdisk would involve all the overhead involved with accessing filesystem, copying data between the ramdisk and the page cache duplicating the data.

If you use ramfs there will be no overhead accessing the data which will be locked in RAM. However this means other data will not be able to be kept in RAM possibly reducing overall performance.

Using tmpfs has a similar performance to anonymous memory.

edited Sep 19 '22 at 09:26

answered Mar 11 '18 at 00:31

Timothy Baldwin

3,551
1
14
23

What is the overhead imposed by mmapping a numpy array located on a ramdisk?

1 Answers1