Multiprocessing newbie looking to work with numpy.
I have a script which segments an image and creates a variable size block of the image as a numpy array.
np_array = gdal.ReadAsArray()
In the past I have processed this array in serial, no problems. From a number of posts here, it looks like my best bet is to convert the array to ctypes, slice it, and send the slices to a number of multiprocessing.Pool. I have a multiband image that I am segmenting, so the code below is in main()
Open the dataset...
Get the first band...
Grab a small segment of that band (these are huge images) and open to nparray
#Convert to ctypes to allow multiprocessing
c_pointer = ctypes.POINTER(ctypes.c_byte) #Here I need a dict of gdal to ctypes conversions.
shared_array = np_array.ctypes.data_as(c_pointer)
shared_array.reshape(intervalx, intervaly)
def my_func(i, def_param=shared_array):
#perform my stretch here
pass
pool = multiprocessing.Pool()
pool.map(my_func, range(10))
print shared_array
From SO - Link
I understand the need to have the def statement mid code, as I am passing in shared_array as a parameter. Is there a better way?
At this point, my code is crashing...hard. What am I missing? Is this not the way to handle type of parallel processing with a numpy array?
Finally, these are images and I need to be able to maintain the order of the array. Is that possible, or do I need to utilize a lock? If so, from numpy or multiprocessing.
Any links to info appreciated, trying to learn how to handle multiprocessing numpy arrays in a shared memory space.
P.S. I would rather avoid using the numpy_sharedmem module if possible because I want to limit the number of additional downloads for potential users.