multiprocessing with large numpy array as argument

Question

I am wishing to use multiprocessing where one of the arguments is a very large numpy array. I’ve researched some other posts appearing to have similar issues

Large numpy arrays in shared memory for multiprocessing: Is sth wrong with this approach?

Share Large, Read-Only Numpy Array Between Multiprocessing Processes

but being rather new to python, I've been having trouble adapting the solutions to this template and in this form. I wonder if I could ask for your help to understand what my options are in order to convey X to the functions in a read-only manner. My simplified snippet of code is here:

import multiprocessing as mp
import numpy as np

def funcA(X):
    # do something with X
    print 'funcA OK'

def funcB(X):
    # do something else with X
    print 'funcB OK'

if __name__=='__main__':
    X=np.random.rand(int(5.00e8),)
    funcA(X) # OK
    funcB(X) # OK
    X=np.random.rand(int(2.65e8),)
    P=[]
    P.append(mp.Process(target=funcA,args=(X,))) # OK
    P.append(mp.Process(target=funcB,args=(X,))) # OK
    for p in P:
        p.start()

    for p in P:
        p.join()

    X=np.random.rand(int(2.70e8),)
    P=[]
    P.append(mp.Process(target=funcA,args=(X,))) # FAIL
    P.append(mp.Process(target=funcB,args=(X,))) # FAIL
    for p in P:
        p.start()

    for p in P:
        p.join()

funcA and funcB appear to accept very large numpy arrays when invoked sequentially. However, if they are invoked as multiprocesses, then there appears to be an upper size limit to the size of the numpy array that can be passed to the function. How could I best get around this?

Note:

0) I do not wish to modify X; only to read from it;

1) I’m running on 64-bit Windows 7 Professional

is `X` unique in your code? Or you have to call the two methods on other variables? — Roberto Trani, Jan 29 '18 at 22:51
@RobertoTrani I plan to call the methods only on X in conjunction with some other, much smaller variables. My motivation here is to because the methods (which are totally independent of one another) each take a long time and I am trying to save time by splitting the workload across multiple processes. I assumed that because X is read-only that it would not be too taxing to set up, but I am finding otherwise. Also, my apologies for misspelling your name below. — cpicke1, Jan 30 '18 at 15:35
@cpicke1 Can you check if you are running a 32bit python version? Probably you can check this from the task manager. I haven't your problems on my linux machine, moreover 2.7e8 floats are around 2GB, and this issue seems related to these ones: [link1](http://grokbase.com/t/python/python-list/102db8s7pb/memoryerror-can-i-use-more) and [link2](https://www.reddit.com/r/learnpython/comments/2g041o/the_python_process_on_my_64_bit_windows_7_machine/). Thus, can we consider the problem related to the python version you are running? If so we can improve the solution based on mmap. — Roberto Trani, Jan 31 '18 at 00:30
@RobertoTrani In my Python console, I applied the method [found here](https://intelligea.wordpress.com/2015/08/05/check-if-python-version-is-64-or-32-bit/) and the result was 64. Furthermore, I have retained the installer executable that I took from the Anaconda website; it is Anaconda2-4.1.0-Windows-x86_64.exe. I think I do not have the 32bit Python. I would also like to clarify that I am not having any problem declaring or processing 5e8 floats or larger; it's only when I am trying to do something in parallel with them that the problems arise! — cpicke1, Jan 31 '18 at 14:25
@cpicke1, I updated the answer and the code, can you try again, please? I am sorry, but I have no way to test it on a windows machine :( — Roberto Trani, Jan 31 '18 at 14:32
@RobertoTrani RE: UPDATE2. The code in your update does run successfully. However, the new `dim` limit is somewhere between 5.368e8 and 5.369e8, and this is double the “old” `dim` limit, which was somewhere between 2.684e8 and 2.685e8. The doubling is perhaps due to `dtype=np.float32` in the lieu of `dtype=np.float`. I also played around a little with `max_chunk_size` but this appeared has not appeared to circumvent the problem.. — cpicke1, Jan 31 '18 at 15:06

Roberto Trani · Answer 1 · 2018-01-31T01:13:44.120

1

The problem can be in the data transfer to the child processes. When read-only objects must be used I prefer to exploit the copy-on-write mechanism used by the underlying OS to manage the memory of the child processes. However, I don't know if windows 7 uses this mechanism. When copy-on-write is available, you can access area of the parent process without copying them inside the child process. This trick works only if you access them in a read-only way and if the object is created before the creation of the processes.

Summing up, a possible solution (at least for linux machines) is this:

import multiprocessing as mp
import numpy as np

def funcA():
    print "A {}".format(X.shape)
    # do something with the global variable X
    print 'funcA OK'

def funcB():
    print "B {}".format(X.shape)
    # do something else with the global variable X
    print 'funcB OK'

if __name__=='__main__':
    X=np.random.rand(int(5.00e8),)
    funcA() # OK
    funcB() # OK

    X=np.random.rand(int(2.65e8),)
    P=[mp.Process(target=funcA), mp.Process(target=funcB)]
    for p in P:
        p.start()

    for p in P:
        p.join()

    X=np.random.rand(int(2.70e8),)
    P=[mp.Process(target=funcA), mp.Process(target=funcB)]
    for p in P:
        p.start()

    for p in P:
        p.join()

UPDATE: after various comments about compatibility problems with Windows, I sketched a new solution uniquely based on native memory maps. In this solution I am creating a numpy memory map on file, which is shared through the file descriptor, thus it doens't require to copy the whole array inside the childs. I found this solution much faster than using multiprocessing.Array!

UPDATE2: The code below has been updated to avoid memory issues during the randomisation of the memory map.

import multiprocessing as mp
import numpy as np
import tempfile

def funcA(X):
    print "A {}".format(X.shape)
    # do something with X
    print 'funcA OK'

def funcB(X):
    print "B {}".format(X.shape)
    # do something else with X
    print 'funcB OK'

if __name__=='__main__':
    dim = int(2.75e8)
    with tempfile.NamedTemporaryFile(dir='/tmp', delete=False) as tmpfile:
        X = np.memmap(tmpfile, shape=dim, dtype=np.float32)  # create the memory map
        # init the map by chunks of size 1e8
        max_chunk_size = int(1e8)
        for start_pos in range(0, dim, max_chunk_size):
            chunk_size = min(dim-start_pos, max_chunk_size)
            X[start_pos:start_pos+chunk_size] = np.random.rand(chunk_size,)
        P=[mp.Process(target=funcA, args=(X,)), mp.Process(target=funcB, args=(X,))]
        for p in P:
            p.start()

        for p in P:
            p.join()

edited Jan 31 '18 at 01:13

answered Jan 29 '18 at 23:01

Roberto Trani

1,217
11
14

On my Windows machine this fails in the debug statement `print "A {}".format(X.shape)` – Thomas Weller Jan 29 '18 at 23:05
I suppose it happen because you are using python 3, while this is python 2 code. I tested the solution on my machine. Moreover, as I said in the answer, this code works only if X is created before creating the child processes (and before calling the two functions) – Roberto Trani Jan 29 '18 at 23:08
I switched to a Python 2 interpreter. The error is `NameError: global name 'X' is not defined`, even if I add a `global X` statement. (Python 2.7.11) – Thomas Weller Jan 29 '18 at 23:11
Are you defining X before calling funcA()? For instance, as I did in the main. – Roberto Trani Jan 29 '18 at 23:13
I copy-pasted the whole code you gave in the answer – Thomas Weller Jan 29 '18 at 23:14
Nevermind, I don't need to get this up and running. I was just interested. I'll go to bed now. – Thomas Weller Jan 29 '18 at 23:15
Just to debug it: Please, can you tell us if this happen in the parallel code or in the sequential code? It is very strange, because on linux with python 2 everything works fine :S – Roberto Trani Jan 29 '18 at 23:17
@roberto-traini I receive exactly the same error message that Mr. Weller received and I too copy-pasted the entire code. This problem occurs at the parallel code. I am using Python 2.7, if that is relevant. I had also attempted to add "global X; X=np.empty(1)" just under the import statements, but the parallel code would only recognise the "initialised" structure of X, and did not recognise my large random X. It seems that using global variables, I was still not able to get around the problem. – cpicke1 Jan 30 '18 at 14:47
1

Windows Python does not use `fork` and hence cannot use the copy-on-write method. On Windows, creating a new `Process` instance spawns a new *empty* Python, which then loads up the same program as the original interpreter. – torek Jan 30 '18 at 15:51
@RobertoTrani RE: UPDATE. This works when X is int. Unfortunately it does not work if X is float. I modified your line to `X=np.memmap(tmpfile, shape=dim, dtype=float)` and I have exactly the same problem as before, and the problem occurs at about the same size as before (somewhere between 2.65e8 and 2.70e8). – cpicke1 Jan 30 '18 at 18:54
Since this problem does not seem to occur in Python 3, I am resigned to using python 3.6 instead of python 2.7. Thank you for all your kind help spent on addressing this issue. – cpicke1 Jan 31 '18 at 16:35

multiprocessing with large numpy array as argument

1 Answers1