I would like to access an existing NumPy array from a subprocess using the multiprocessing
module without copying the array to a shared memory object. Apparently, multiprocessing.Array
creates such a shared memory array, but I can't seem to be able to point the array to an existing numpy.ndarray
object. This is critical, because the existing array can be quite large (up to a couple of GB), so I definitely need to avoid any copy operations.
Here's what I've tried so far:
import multiprocessing as mp
import numpy as np
def f(x, idx):
"""Dummy function to manipulate an array."""
x[idx] = 999
a = np.array([1.2, 15.8, 10.3, 7.4, -44.9])
b = mp.Array("d", a) # apparently this creates a copy of a in b
print("Original array:".rjust(28, " "), a)
f(a, 0)
print("Change a[0] in main process:".rjust(28, " "), a)
p = mp.Pool(1)
p.apply_async(f, args=(b, 4))
print("Change b[4] in subprocess:".rjust(28, " "), np.frombuffer(b.get_obj()))
Ideally, I'd like a
and b
refer to the same underlying numbers, but apparently this is not working. Interestingly, b
is also not changed by p.apply_async(f, args=(b, 4))
- this is probably not related to the original question, but I'd still like to understand why.