I am reading in multiple images into their own numpy arrays. They are all the same size and do not share data so I figured using multiprocessing was the way to go. However, when I go to read the images in, there is a long wait time on the backend after they've been loaded for some reason.
Here is the function I'm calling:
def image_to_matrix_mp(image_str):
"""
Reads in image files and stores them in a numpy array. Doing operations on an array and then writing to the image
is faster than doing them directly on the image. This uses multiprocessing to read each file simultaneously.
:param image_str:
:return: array
"""
pic = Image.open(INPUT_FILES_DIR + image_str)
image_size = pic.size
array = np.empty([image_size[0], image_size[1]], dtype=tuple)
for x in range(image_size[0]):
if x % 1000 == 0:
print("x = %d, --- %s seconds ---" % (x, time.time() - start_time))
array_x = array[x]
for y in range(image_size[1]):
array_x[y] = pic.getpixel((x,y))
return array
and here is how I'm calling it:
def main():
start_time = time.time()
p = multiprocessing.Pool(processes=5)
[land_prov_array, areas_array, regions_array, countries_array, sea_prov_array] = p.map(POF.image_to_matrix_mp,
['land_provinces.bmp',
'land_areas.bmp',
'land_regions.bmp',
'countries.bmp',
'sea_provinces.bmp'])
p.close()
width = len(land_prov_array) # Width dimension of the map
height = len(land_prov_array[0]) # Height dimension of the map
print("All images read in --- %s seconds ---" % (time.time() - start_time))
I print out every 1,000th column of the matrix and then the total time it took to read the process in for debugging purposes. This is the tail end of the output:
x = 15000, --- 84.4389169216156 seconds ---
x = 15000, --- 84.94356632232666 seconds ---
x = 15000, --- 85.07920360565186 seconds ---
x = 15000, --- 85.1400408744812 seconds ---
x = 15000, --- 85.99774622917175 seconds ---
x = 16000, --- 89.95117163658142 seconds ---
x = 16000, --- 90.62337279319763 seconds ---
x = 16000, --- 90.62437009811401 seconds ---
x = 16000, --- 90.76798582077026 seconds ---
x = 16000, --- 91.90195274353027 seconds ---
All images read in --- 275.9242513179779 seconds ---
The images are 16,200 x 6,000 and as you can see here, all 5 have been read in in the first hundred seconds. However, it takes another 175 seconds for the code to move on meaning that ending the multiprocessing takes almost twice as long as the actual function it's running. Is this normal overhead for multiprocessing or am I doing something wrong?