1

I am reading in multiple images into their own numpy arrays. They are all the same size and do not share data so I figured using multiprocessing was the way to go. However, when I go to read the images in, there is a long wait time on the backend after they've been loaded for some reason.

Here is the function I'm calling:

def image_to_matrix_mp(image_str):
    """
    Reads in image files and stores them in a numpy array. Doing operations on an array and then writing to the image
    is faster than doing them directly on the image. This uses multiprocessing to read each file simultaneously.
    :param image_str:
    :return: array
    """
    pic = Image.open(INPUT_FILES_DIR + image_str)
    image_size = pic.size
    array = np.empty([image_size[0], image_size[1]], dtype=tuple)
    for x in range(image_size[0]):
        if x % 1000 == 0:
            print("x = %d, --- %s seconds ---" % (x, time.time() - start_time))
        array_x = array[x]
        for y in range(image_size[1]):
            array_x[y] = pic.getpixel((x,y))

    return array

and here is how I'm calling it:

def main():
    start_time = time.time()

    p = multiprocessing.Pool(processes=5)
    [land_prov_array, areas_array, regions_array, countries_array, sea_prov_array] = p.map(POF.image_to_matrix_mp, 
                                                  ['land_provinces.bmp',
                                                   'land_areas.bmp',
                                                   'land_regions.bmp',
                                                   'countries.bmp',
                                                   'sea_provinces.bmp'])
    p.close()


    width = len(land_prov_array)     # Width dimension of the map
    height = len(land_prov_array[0])     # Height dimension of the map

    print("All images read in --- %s seconds ---" % (time.time() - start_time))

I print out every 1,000th column of the matrix and then the total time it took to read the process in for debugging purposes. This is the tail end of the output:

x = 15000, --- 84.4389169216156 seconds ---
x = 15000, --- 84.94356632232666 seconds ---
x = 15000, --- 85.07920360565186 seconds ---
x = 15000, --- 85.1400408744812 seconds ---
x = 15000, --- 85.99774622917175 seconds ---
x = 16000, --- 89.95117163658142 seconds ---
x = 16000, --- 90.62337279319763 seconds ---
x = 16000, --- 90.62437009811401 seconds ---
x = 16000, --- 90.76798582077026 seconds ---
x = 16000, --- 91.90195274353027 seconds ---
All images read in --- 275.9242513179779 seconds ---

The images are 16,200 x 6,000 and as you can see here, all 5 have been read in in the first hundred seconds. However, it takes another 175 seconds for the code to move on meaning that ending the multiprocessing takes almost twice as long as the actual function it's running. Is this normal overhead for multiprocessing or am I doing something wrong?

  • Is the `start_time` in `image_to_matrix_mp` the same `start_time` declared in `main`? Don't see any `start_time` declared in `image_to_matrix_mp`. – Mario Ishac May 10 '20 at 14:26
  • You should not be calling `getpixel()` for every pixel!!! Just read every other answer on StackOverflow about PIL and use `im=Image.open(XYZ)` then `na=np.array(im)` and you'll have a Numpy array of all the pixels. – Mark Setchell May 10 '20 at 16:28
  • @MarioIshac Yes it is the same – Jerry Ginger May 11 '20 at 16:43
  • @MarkSetchell I tried that originally but that reads everything into a 3D numpy array of size 16,200 x 6,000 x 3 instead of a 2D array of tuples of size 16,200 x 6,000 with every cell being an RGB tuple. The RGB tuple part is important because the rest of the code doesn't work with the 3D array. Do you know if there is an easy and efficient way of converting the 3D numpy array into a 2D array of tuples? Honestly I'm just using these as a sort of id so turning the 3D array into a 2D array of hashes might work for my problem. I'm still curious about the multiprocessing time though. – Jerry Ginger May 11 '20 at 16:47
  • I don't... but questions, and answers, are free on StackOverflow. So you can ask another, tagged with `numpy` and `python` and see if any Numpy masters know how to make an MxNx3 array into an MxN array of tuples. I do know how to make MxNx3 into MxN of 24-bit numbers as in second part here... https://stackoverflow.com/a/59671950/2836621 – Mark Setchell May 11 '20 at 16:56

0 Answers0