4

Let's say I have an existing array that we don't want to make any changes to, but like to be converted to a ctype array and be shared among all the multiprocessing later on.

The actual array I want to be shared is of shape 120,000 x 4, which is too large to type all out here, so let's pretend such an array is way smaller and looks like this:

import numpy as np
import multiprocessing as mp
import ctypes

array_from_data = np.array([[275,174,190],
                          [494, 2292, 9103],
                          [10389,284,28],
                          [193,746,293]])

I have read other posts that discuss the ctype array and multiprocessing, like this one. However, the answers are not quite the same as what I am looking for, because so far they are not exactly about converting an existing NumPy array.

My questions are the following:

1) How to do a simple conversion from an existing Numpy array to a ctype array?

2) How to make the array to be shared among all the multiprocessing in a simple fashion?

Thank you in advance.

EDIT: spellings and some clarifications on the actual array

EDIT2: Apparently the os itself affects how the multiprocessing will behave and I need to specify it: My os is Windows 10 64-bit.

mathguy
  • 1,450
  • 1
  • 16
  • 33
  • Why does it need to be converted to be shared? – Davis Herring May 18 '19 at 22:57
  • The reason the large array needs to be shared among the multiprocessing is to reduce the overhead of passing the same large size array over and over again to a function inside a multiprocessing. The reason for converting is that without conversion, sharing a numpy array is impossible, according to what others said in other posts. – mathguy May 18 '19 at 23:03
  • I don’t see any reason that an `ndarray` (of numeric type) wouldn’t just work, assuming either the `fork` method or access via a global for `forkserver`. – Davis Herring May 19 '19 at 00:29
  • To be honest, two weeks ago was the time I first learnt about multiprocessing, so I am still new to the whole thing and oblivious to snippets of it combined with numpy stuff. If you don't mind, can you show me a simple example of sharing an existing array across all the multiprocessing stuff? Thank you in advance @DavisHerring – mathguy May 19 '19 at 00:42
  • What start method are you using? (Your operating system strongly influences this!) – Davis Herring May 19 '19 at 01:54
  • what's a start method? my os is Windows 10 64 bits, python version is 3.6.8. Do they really matter when it comes to multiprocessing and sharing numpy array? – mathguy May 19 '19 at 04:28
  • Windows doesn’t have the same process-creation capabilities as Unix, and so `multiprocessing` (which, unsurprisingly, is quite sensitive to them) is [trickier and less flexible](https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods) there. You have to use explicitly shared memory and construct your array therein; you should edit your question to address this case, since it’s quite different from the trivial Unix approach. – Davis Herring May 19 '19 at 12:43

1 Answers1

0

The workaround I found months ago requires flattening the array into a 1-dimensional array first, even though I only understand half of what is under the hood.

The gist of the solution is to:

1) make a RawArray of the same size and same dtypes as the array we are trying to share

2) create a numpy array that uses the same memory location as the RawArray

3) fill in the elements to the newly created numpy array

Workaround:

import ctypes
import multiprocessing as mp

import numpy as np


array_from_data = np.array([[275,174,190],
                          [494, 2292, 9103],
                          [10389,284,28],
                          [193,746,293]])

flattened_array1 = array_from_data.flatten(order='C')
flattened_array2 = np.array([1,0,1,0,1]).astype(bool)
flattened_array3 = np.array([1,0,1,0,-10]).astype(np.float32)

array_shared_in_multiprocessing1 = mp.RawArray(ctypes.c_int32,len(flattened_array1))
temp1 = np.frombuffer(array_shared_in_multiprocessing1, dtype=np.int32)
temp1[:] = flattened_array1

array_shared_in_multiprocessing2 = mp.RawArray(ctypes.c_bool,len(flattened_array2))
temp2 = np.frombuffer(array_shared_in_multiprocessing2, dtype=bool)
temp2[:] = flattened_array2

array_shared_in_multiprocessing3 = mp.RawArray(ctypes.c_float,len(flattened_array3))
temp2 = np.frombuffer(array_shared_in_multiprocessing3, dtype=np.float32)
temp2[:] = flattened_array3
mathguy
  • 1,450
  • 1
  • 16
  • 33