9

I want to share a numpy array across multiple processes. The processes only read the data, so I want to avoid making copies. I know how to do it if I can start with a multiprocessing.sharedctypes.RawArray and then create a numpy array using numpy.frombuffer. But what if I am initially given a numpy array? Is there a way to initialize a RawArray with the numpy array's data without copying the data? Or is there another way to share the data across the processes without copying it?

christianbrodbeck
  • 2,113
  • 2
  • 19
  • 24
  • I'm not sure if this is possible - the `numpy` array wouldn't have been allocated from shared memory. I don't think you can created a `sharedctypes.RawArray` without actually copying the data into shared memory space. [The docs](https://docs.python.org/2.7/library/multiprocessing.html#module-multiprocessing.sharedctypes) note that you're allowed to just store a pointer to the object in shared memory, but its likely to be invalid in the second process, since it's pointing to another process' address space. – dano Oct 10 '14 at 15:32
  • @christianmbrodbeck [this answer](http://stackoverflow.com/a/20520295/832621) explains how to use Cython and OpenMP to work on the same array through different processes using shared memory – Saullo G. P. Castro Oct 15 '14 at 06:13
  • Thanks for the pointer @saullo-castro. Since there seem to be some hurdles to trying this solution on OS X (no openmp support by Xcode as I gather) is it a worthwhile solution for performing relatively simple operations on large arrays? Doesn't every prange statement incur the overhead of creating child processes? The dot product in the example does not seem to benefit from it. (I.e. I assume the prange statement would have to be the outermost loop for optimal speed gain?) – christianbrodbeck Oct 18 '14 at 19:36
  • @christianmbrodbeck yes, `prange` creates some overhead which is miniimzed if you create [static threads](http://docs.cython.org/src/userguide/parallelism.html#cython.parallel.prange). You are right... [in this example](http://stackoverflow.com/a/20520295/832621) `prange` is already at the outermost loop – Saullo G. P. Castro Oct 18 '14 at 20:16
  • Thanks @saullo-castro, assuming my target function is `dot()` from your example, and I am going to call `dot()` many times from Python, is it possible to avoid having the overhead associated with initializing subprocesses in every call? Or would that basically require implementing the outer loop in Cython? – christianbrodbeck Oct 20 '14 at 13:52
  • @christianmbrodbeck in that example you only one Python thread is launch, and internally multiple cores are used for this thread... in your question it seems you need to use many threads accessing data from the same array, is that right? – Saullo G. P. Castro Oct 20 '14 at 14:37
  • @SaulloCastro I could go either way. The reason I use many threads is that I implemented an outer function in Python, and then started optimizing subfunctions in Cython (so the outer function creates those threads). However I could implement the outer function in Cython if that would have clear advantages. – christianbrodbeck Oct 20 '14 at 14:56
  • @christianmbrodbeck there is a bit advantage of using the outer function in Cython if you are going to call multiple times a function like that `dot()` example IF you declare it as t `cdef` function... in this way it has the overhead of a C function call, avoiding all the overhead that comes with Python's API that you carry when using a `def` function... – Saullo G. P. Castro Oct 20 '14 at 14:59
  • Have you considered using threads instead of processes, e.g. with `multiprocessing.dummy`? Numpy is getting quite good at releasing the GIL (CPython's Global Interpreter Lock) and Cython also has the `nogil` functionality. –  Nov 04 '14 at 09:04
  • Thanks @moarningsun I haven't been using threads because I had quite some Python code in there, but maybe I should do some profiling – christianbrodbeck Nov 05 '14 at 18:49
  • Does this answer your question? [How do I pass large numpy arrays between python subprocesses without saving to disk?](https://stackoverflow.com/questions/5033799/how-do-i-pass-large-numpy-arrays-between-python-subprocesses-without-saving-to-d) – Markus Dutschke Aug 03 '20 at 09:04
  • checkout this https://stackoverflow.com/questions/5033799/how-do-i-pass-large-numpy-arrays-between-python-subprocesses-without-saving-to-d/5036766#5036766 and this https://stackoverflow.com/questions/7894791/use-numpy-array-in-shared-memory-for-multiprocessing – Markus Dutschke Aug 03 '20 at 09:05
  • @MarkusDutschke as far as I can tell all these involve copying the data – christianbrodbeck Aug 04 '20 at 11:47
  • @christianbrodbeck yes, they do. Sorry - I forgot to point that out. In the answers of those questions, several statements are given, that what you want is not possible (this is what I wanted to say) :( – Markus Dutschke Aug 04 '20 at 12:07

4 Answers4

4

To my knowledge it is not possible to declare memory as shared after it was assigned to a specific process. Similar discussions can be found here and here (more suitable).

Let me quickly sketch the workaround you mentioned (starting with a RawArray and get a numpy.ndarray refference to it).

import numpy as np
from multiprocessing.sharedctypes import RawArray
# option 1
raw_arr = RawArray(ctypes.c_int, 12)
# option 2 (set is up, similar to some existing np.ndarray np_arr2)
raw_arr = RawArray(
        np.ctypeslib.as_ctypes_type(np_arr2.dtype), len(np_arr2)
        )
np_arr = np.frombuffer(raw_arr, dtype=np.dtype(raw_arr))
# np_arr: numpy array with shared memory, can be processed by multiprocessing

If you have to start with a numpy.ndarray, you have no other choice as to copy the data

import numpy as np
from multiprocessing.sharedctypes import RawArray

np_arr = np.zeros(shape=(3, 4), dtype=np.ubyte)
# option 1
tmp = np.ctypeslib.as_ctypes(np_arr)
raw_arr = RawArray(tmp._type_, tmp)
# option 2
raw_arr = RawArray(np.ctypeslib.as_ctypes_type(np_arr.dtype), np_arr.flatten())

print(raw_arr[:])
Markus Dutschke
  • 9,341
  • 4
  • 63
  • 58
  • There's no need to use `np.frombuffer` here, you can just use `np.asarray` and it will automatically find the appropriate dtype too. – Eric Aug 31 '20 at 12:02
0

I also have some of your requirements: a) given a large numpy array, b) need to share it among a bunch of processes c) read-only etc. And, for this I have been using something along the lines of:

mynparray = #initialize a large array from a file
shrarr_base_ptr = RawArray(ctypes.c_double, len*rows*cols)
shrarr_ptr = np.frombuffer(shrarr_base_ptr)
shrarr_ptr = mynparray

where in my case, mynparray is 3-D. As for the actual sharing, I used the following style and it works so far.

    inq1 = Queue()
    inq2 = Queue()  
    outq = Queue()
    p1 = Process(target = myfunc1, args=(inq1, outq,))
    p1.start()
    inq1.put((shrarr_ptr, ))
    p2 = Process(target = myfunc2, args=(inq2, outq,))
    p2.start()
    inq2.put((shrarr_ptr,))
    inq1.close()
    inq2.close()
    inq1.join_thread()
    inq2.join_thread()
    ....
Jey
  • 590
  • 1
  • 5
  • 18
  • 1
    In your line `shrarr_ptr = mynparray` you assign the original numpy array to the name `shrarr_ptr`. When you later do `inq1.put((shrarr_ptr,))` you send the whole numpy array through the `Queue`... – christianbrodbeck Nov 03 '14 at 14:39
  • No, RawArray is from sharedctypes, so the object is created in shared memory and gets inherited. I am not physically sending the whole array. Moreover, sending such large objects through queue will take forever, from my experience. Here is my [inspiration](http://stackoverflow.com/questions/7894791/use-numpy-array-in-shared-memory-for-multiprocessing) for the above code. – Jey Nov 03 '14 at 16:01
  • 1
    @christianbrodbeck is right, the line `shrarr_ptr = mynparray` is a problem. I think it needs to be `shrarr_ptr[:] = mynparray[:]` so the data is copied over to the new shared memory. – coderforlife Mar 31 '16 at 20:34
  • To address the problem of the data being propagated back to the original numpy array, this is not possible. One solution is to allocate the `RawArray`-based ndarray earlier and use it as the `out` argument of whatever creates the original ndarray (if possible). – coderforlife Mar 31 '16 at 20:37
  • I agree with @christianbrodbeck. This answer jsut working with the np.ndarray and has two unnecessary lines of code in it. This can be checked by `print(id(mynparray)` and `print(id(shrarr_base_ptr)`. – Markus Dutschke Aug 03 '20 at 08:45
0

Note that if you plan to work with numpy arrays, you can omit RawArray entirely, and use:

from multiprocessing.heap import BufferWrapper

def shared_array(shape, dtype):
    dt = np.dtype((dtype, shape))
    wrapper = BufferWrapper(dt.itemsize)
    mem = wrapper.create_memoryview()

    # workaround for bpo-41673 to keep `wrapper` alive
    ct = (ctypes.c_ubyte * dt.itemsize).from_buffer(mem)
    ct._owner = wrapper
    mem = memoryview(ct)

    return np.asarray(mem).view(dt)

The advantage of this approach is it works in cases where np.ctypeslib.as_ctypes_type would fail.

Eric
  • 95,302
  • 53
  • 242
  • 374
-1

I'm not sure if this copies the data internally, but you could pass the flat array:

a = numpy.random.randint(1,10,(4,4))
>>> a
array([[5, 6, 7, 7],
       [7, 9, 2, 8],
       [3, 4, 6, 4],
       [3, 1, 2, 2]])

b = RawArray(ctypes.c_long, a.flat)
>>> b[:]
[5, 6, 7, 7, 7, 9, 2, 8, 3, 4, 6, 4, 3, 1, 2, 2]
Mike
  • 6,813
  • 4
  • 29
  • 50
  • Doesn't work, both judging from memory usage as well as because modifying b does not affect a... – christianbrodbeck Oct 10 '14 at 16:10
  • Memory that was not initially allocated as "shared" cannot be made (at least not easily) shared. You need to copy the data to a shared memory block or use the method by @Jey which makes the array originally based on shared memory and thus always shared. – coderforlife Sep 27 '15 at 23:27