1

Multiprocessing newbie looking to work with numpy.

I have a script which segments an image and creates a variable size block of the image as a numpy array.

np_array = gdal.ReadAsArray()

In the past I have processed this array in serial, no problems. From a number of posts here, it looks like my best bet is to convert the array to ctypes, slice it, and send the slices to a number of multiprocessing.Pool. I have a multiband image that I am segmenting, so the code below is in main()

        Open the dataset...
        Get the first band...
        Grab a small segment of that band (these are huge images) and open to nparray

        #Convert to ctypes to allow multiprocessing
        c_pointer = ctypes.POINTER(ctypes.c_byte) #Here I need a dict of gdal to ctypes conversions.
        shared_array = np_array.ctypes.data_as(c_pointer)
        shared_array.reshape(intervalx, intervaly) 

        def my_func(i, def_param=shared_array):
            #perform my stretch here
            pass

        pool = multiprocessing.Pool()
        pool.map(my_func, range(10))

        print shared_array

From SO - Link

I understand the need to have the def statement mid code, as I am passing in shared_array as a parameter. Is there a better way?

At this point, my code is crashing...hard. What am I missing? Is this not the way to handle type of parallel processing with a numpy array?

Finally, these are images and I need to be able to maintain the order of the array. Is that possible, or do I need to utilize a lock? If so, from numpy or multiprocessing.

Any links to info appreciated, trying to learn how to handle multiprocessing numpy arrays in a shared memory space.

P.S. I would rather avoid using the numpy_sharedmem module if possible because I want to limit the number of additional downloads for potential users.

Community
  • 1
  • 1
Jzl5325
  • 3,898
  • 8
  • 42
  • 62

0 Answers0