1

I am trying to implement something like this: I have a very large (~60GB) numpy array stored on disk. I want to read it all (it fits the RAM of the machine) and store it in memory. No problems so far. What I need to do is to allow multiple processes, potentially spawned by multiple users, to have read access to the data (since it would not be feasible to have each process allocate 60GB of RAM).

The issue is that I cannot spawn all the processes from a master script that gives them a reference to the data, which would solve the problem. The question hence is: how do I load the file into memory and then have a reference (e.g. a file name of a memory mapped file, or a pointer thinking about my C++ classes) that I can share between processes, and then access the data from the other processes?

I have been looking into the mmap module but I don't quite understand how to obtain this using it. For example: mmap.mmap(f.fileno(), 0) creates a memory mapped file with the content of f, right? But in my case I have a .npy file that includes headers and stuff like that, so I cannot directly bring it in memory and then access the data with the usual slicing operations.

I have seen a lot of discussions trying to do the opposite with numpy (i.e. memory map a large file so that you don't need to read it all in one time), but nothing so far on this, and I'm not sure it can be done, can it?

Thanks for any input, I know it really sounds confused but I'm not sure there is a better way to explain it..

powder
  • 1,163
  • 2
  • 16
  • 32
  • [Use numpy array in shared memory for multiprocessing](http://stackoverflow.com/q/7894791/2823755) - or some of the other hits for "share a numpy array between proccess"?? Like [Share Large, Read-Only Numpy Array Between Multiprocessing Processes](http://stackoverflow.com/q/17785275/2823755) – wwii Feb 23 '17 at 02:36
  • [SharedArray 2.0.2](https://pypi.python.org/pypi/SharedArray) at PyPI – wwii Feb 23 '17 at 02:41

0 Answers0