I am trying to run a parallel process in python, wherein I have to extract certain polygons from a large array based on some conditions. The large array has 10k+ polygons that are indexed.
In a extract_polygon
function I pass (array, index). Based on index the function has to either return the polygon corresponding to that index or not based on the conditions defined. The array is never changed and is only used for reading the polygon based on the index provided.
Since the array is very large, I am running into out of memory error during parallel processing. how can I avoid that? (In a way, how to effectively use shared array in multiprocessing?)
Below is my sample code:
def extract_polygon(array, index):
try:
islays = ndimage.find_objects(clone==index)
poly = clone[islays[0][0],islays[0][1]]
area = np.count_nonzero(ploy)
minArea = 100
maxArea = 10000
if (area > minArea) and (area < maxArea):
return poly
else:
return None
except:
return None
start = time.time()
pool = mp.Pool(10)
results = pool.starmap(get_objects,[(array, index) for index in indices])
pool.close()
pool.join()
#indices here is a list of all the indexes we have.
Can I use any other library like ray
in this case?