2

I created a for loop that would loop through a directory of images and resize every image and then saves it to another directory. The code works but I'm trying to parallelize the process to make it faster.

This is the resize function

import cv2 
import os

def resize_image(img):
    # get the name of the file
    name = os.path.basename(img)
    # read the image
    img = cv2.imread(img)
    # resize and save to new directory
    resize = cv2.resize(img, (700, 700)) 
    resized_image = cv2.imwrite("Resized/"+name, resize)

And here is the for loop that would loop through the images in the directory (takes around 700 seconds to resize all the images in the directory).

SOURCE_DIRECTORY = "Source/"
directory_list = os.listdir(SOURCE_DIRECTORY)

for source_file in directory_list:
    source_path = os.path.join(SOURCE_DIRECTORY, source_file) 
    if os.path.isfile(source_path):
        resize_image(source_path)

In an effort to parallelize the process I tried using concurrent.futures and map it to the resize function.

import concurrent.futures

SOURCE_DIRECTORY = "Source/"
directory_list = os.listdir(SOURCE_DIRECTORY)

with concurrent.futures.ProcessPoolExecutor() as executor: 
    executor.map(resize_image, directory_list)

But I instantly get this error.

BrokenProcessPool: A child process terminated abruptly, the process pool is not usable anymore

How can I parallelize the process of resizing the images. Any help would be appreciated.

FARIS
  • 37
  • 5
  • Does [this](https://stackoverflow.com/q/41454049/238704) answer your question? https://stackoverflow.com/q/41454049/238704 – President James K. Polk Dec 30 '22 at 22:25
  • Unfortunately, it doesn't. Also, I'm open to any other way to parallelize the code. It doesn't have to be using concurrent.futures. – FARIS Dec 30 '22 at 22:33
  • If you're open to using another library, this is really easy to do with Ray. It will manage the process creation/destruction for you, you just need to add a decorator to your parallelizable function and you'll get a future to wait on. See https://towardsdatascience.com/modern-parallel-and-distributed-python-a-quick-tutorial-on-ray-99f8d70369b8#941f – cade Dec 30 '22 at 22:58

1 Answers1

0

Here is sample skeleton you can use to parallelize the task (using multiprocessing.Pool):

import os
from multiprocessing import Pool

import cv2


def resize_image(file_name):
    # get the name of the file
    name = os.path.basename(file_name)

    # just to be sure the file exists (skip if not necessary):
    if not os.path.exists(name):
        return f"{name} does not exist!"

    # read the image
    img = cv2.imread(img)
    # resize and save to new directory
    resize = cv2.resize(img, (700, 700))
    resized_image = cv2.imwrite("Resized/" + name, resize)

    return f"{name} resized."


if __name__ == "__main__":

    SOURCE_DIRECTORY = "Source/"
    directory_list = os.listdir(SOURCE_DIRECTORY)

    filelist = []
    for source_file in directory_list:
        source_path = os.path.join(SOURCE_DIRECTORY, source_file)
        if os.path.isfile(source_path):
            filelist.append(source_path)

    with Pool(4) as pool:  # 4 is number of processes we want to use
        for result in pool.imap_unordered(resize_image, filelist):
            print(result)
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • I've been running this code for the past 5 minutes and nothing is happening for some reason, no error is showing, and the directory is not being populated with images. I tried interrupting the kernel hoping for an error message to show up, but it just got stuck. I'm not sure what issue here is. – FARIS Dec 30 '22 at 23:19
  • @Faris You should put debug messages around the code (what file is being processed etc.) Also, I recommend to run the program from the terminal, not from Jupiter notebook or similar. – Andrej Kesely Dec 30 '22 at 23:23
  • I checked the terminal while the code is running, I kept getting this error: AttributeError: Can't get attribute 'resize_image' on Process SpawnPoolWorker-126: Process SpawnPoolWorker-127: I'll try looking more into it – FARIS Dec 30 '22 at 23:36
  • @Faris Do you have `if __name__ == "__main__":` in your script? The error message states Python cannot find the `resize_image` function. – Andrej Kesely Dec 30 '22 at 23:37
  • 1
    Okay everything is working great now. The issue was that Pool does not work with functions not defined in an imported module. See: stackoverflow.com/a/42383397/19681378. Thank you. – FARIS Dec 31 '22 at 13:59