Is it possible to share a numpy array that's not empty between processes?

Question

I thought that SharedMemory would keep values of target arrays, but when I actually tried it, it seems it doesn't.

from multiprocessing import Process, Semaphore, shared_memory
import numpy as np
import time
 
dtype_eV = np.dtype({ 'names':['idx', 'value', 'size'], \
                    'formats':['int32', 'float64', 'float64'] })
 
 
def worker_writer(id, number, a, shm):
    exst_shm = shared_memory.SharedMemory(name=shm)
    b = np.ndarray(a.shape, dtype=a.dtype, buffer=exst_shm.buf)
 
    for i in range(5):
        time.sleep(0.5)
        b['idx'][i] = i
 
def worker_reader(id, number, a, shm):
    exst_shm = shared_memory.SharedMemory(name=shm)
    b = np.ndarray(a.shape, dtype=a.dtype, buffer=exst_shm.buf)
 
    for i in range(5):
        time.sleep(1)
        print(b['idx'][i], b['value'][i])
 
 
if __name__ == "__main__":
    a = np.zeros(5, dtype=dtype_eV)
    a['value'] = 100
    shm = shared_memory.SharedMemory(create=True, size=a.nbytes)  
    c = np.ndarray(a.shape, dtype=a.dtype, buffer=shm.buf)
    th1 = Process(target=worker_writer, args=(1, 50000000, a, shm.name))
    th2 = Process(target=worker_reader, args=(2, 50000000, a, shm.name))
 
    th1.start()
    th2.start()
    th1.join()
    th2.join()

'''
result:
0 0.0
1 0.0
2 0.0
3 0.0
4 0.0
'''

In the code above, the 2 processes can share one array(a) and access to it. But the value that was given before sharing(a['value'] = 100) is missing. Is it just natural or is there any way to keep the value even after sharing?

"But the value that was given before sharing(a['value'] = 100) is missing." - what? Why would that value show up? You stored that in `a`, not in shared memory. — user2357112, Mar 01 '21 at 13:36
I don't see what the point of `a` was supposed to be. It's clear you expected it to do something useful, but you never read anything you stored there - all you read is metadata like its shape, dtype, and buffer size. — user2357112, Mar 01 '21 at 13:39
@user2357112-supports-monica Thank you for the answer. I think I misunderstood the concept. I thought that there has to be an original array and it's shared between processes via SharedMemory. — maynull, Mar 01 '21 at 13:43
@SergeBallesta: Wrong - the `buffer` *does*, in fact, specify memory for the array to use. (See [here](https://ideone.com/gGMad0) for a demonstration.) That memory isn't the memory where the `100` was written to, though. — user2357112, Mar 01 '21 at 14:50
you should only use `a` to calculate `nbytes` then it's not very useful. use `c` instead. you never pass any of the values contained in `a` to the shm constructor, so how would it have any knowledge of `['value'] = 100`? — Aaron, Mar 01 '21 at 15:15

score 4 · Accepted Answer · answered Mar 01 '21 at 15:13

Here's an example of how to use shared_memory using numpy. It was pasted together from several of my other answers, but there are a couple pitfalls to keep in mind with shared_memory:

When you create a numpy ndarray from a shm object, it doesn't prevent the shm from being garbage collected. The unfortunate side effect of this is that the next time you try to access the array, you get a segfault. From another question I created a quick ndarray subclass to just attach the shm as an attribute, so a reference sticks around, and it doesn't get GC'd.
Another pitfall is that on Windows, the OS does the tracking of when to delete the memory rather than giving you the access to do so. That means that even if you don't call unlink, the memory will get deleted if there are no active references to that particular segment of memory (given by the name). The way to solve this is to make sure you keep an shm open on the main process that outlives all child processes. Calling close and unlink at the end keeps that reference to the end, and makes sure on other platforms you don't leak memory.

import numpy as np
import multiprocessing as mp
from multiprocessing.shared_memory import SharedMemory

class SHMArray(np.ndarray): #copied from https://numpy.org/doc/stable/user/basics.subclassing.html#slightly-more-realistic-example-attribute-added-to-existing-array
    '''an ndarray subclass that holds on to a ref of shm so it doesn't get garbage collected too early.'''
    def __new__(cls, input_array, shm=None):
        obj = np.asarray(input_array).view(cls)
        obj.shm = shm
        return obj

    def __array_finalize__(self, obj):
        if obj is None: return
        self.shm = getattr(obj, 'shm', None)

def child_func(name, shape, dtype):
    shm = SharedMemory(name=name)
    arr = SHMArray(np.ndarray(shape, buffer=shm.buf, dtype=dtype), shm)
    arr[:] += 5
    shm.close() #be sure to cleanup your shm's locally when they're not needed (referring to arr after this will segfault)

if __name__ == "__main__":
    shape = (10,) # 1d array 10 elements long
    dtype = 'f4' # 32 bit floats
    dummy_array = np.ndarray(shape, dtype=dtype) #dumy array to calculate nbytes
    shm = SharedMemory(create=True, size=dummy_array.nbytes)
    arr = np.ndarray(shape, buffer=shm.buf, dtype=dtype) #create the real arr backed by the shm
    arr[:] = 0
    print(arr) #should print arr full of 0's
    p1 = mp.Process(target=child_func, args=(shm.name, shape, dtype))
    p1.start()
    p1.join()
    print(arr) #should print arr full of 5's
    shm.close() #be sure to cleanup your shm's
    shm.unlink() #call unlink when the actual memory can be deleted

Thank you for your answer and other references! I've learned a lot from them! — maynull, Mar 02 '21 at 10:06
Hi @Aaron, I have a question about running `multiprocessing.Pool` on Windows laptop [here](https://stackoverflow.com/questions/66445724/why-does-this-parallel-process-run-infinitely-on-windows). I hope that you can take some time have a check on this question. Thank you so much for your help! — Akira, Mar 03 '21 at 08:56

score 0 · Answer 2 · answered Jul 06 '22 at 16:49

Alternative, without the dummy array:

import math
s=np.dtype(dtype).itemsize*math.prod(list(shape))
    # see https://stackoverflow.com/questions/16972501/size-of-data-type-using-numpy/16972612#16972612 
shm = shared_memory.SharedMemory(create=True, size=s)

Instead of:

dummy_array = np.ndarray(shape, dtype=dtype) #dumy array to calculate nbytes
shm = SharedMemory(create=True, size=dummy_array.nbytes)

Is it possible to share a numpy array that's not empty between processes?

2 Answers2

Linked