I’m trying to create Numpy array that can be access by other process on the same machine extremely fast. After a bunch of research and testing, I decided to try Python 3.8 shared memory to achieve this with the hope that using shared memory can allow sharing large numpy array between process at sub-milliseconds speed.
The implementation will be something like this:
- On the first ipython shell (on the first process):
import time
import numpy as np
from multiprocessing.shared_memory import SharedMemory
arr = np.random.randint(0, 255, (5000, 5000, 4), dtype=np.uint8)
shm = SharedMemory(create=True, size=arr.nbytes)
shm_arr = np.ndarray(arr.shape, dtype=arr.dtype, buffer=shm.buf)
shm_arr[:] = arr[:]
- On the second ipython shell (on the second process):
import numpy as np
import time
from multiprocessing.shared_memory import SharedMemory
shm = SharedMemory(name='test')
shm_arr = np.ndarray([5000, 5000, 4], dtype=np.uint8, buffer=shm.buf)
shm.close()
But there is a bottleneck here. For the creation of the shared numpy array on the first process, I need to use:
shm_arr[:] = arr[:]
Which mean there is a memory copy operation from arr
to shm_arr
. For large array, this can take a lot of time. For examples, the above 5000 x 5000 x 4
array tooks about 55 ms
for that assignment call alone. This make the whole thing only 20% faster than serialize the the whole array with pickle at max in my testing. Reconstruction time on the other hand is around 5ms, which is still not sub-milliseconds.
Question:
- Are there anything that I’m doing wrong here ?
- Are there any ways to create the shared numpy array on the first process without memory copy so it can be much faster ?
Thanks for checking by.