2

I would like to iterate through a folder of image files in python and do some work on it. So it starts like this:

for image in os.listdir(imagePath):
    if image.endswith('.jpg'):
         <DO STUFF HERE>

I have a lot of images in that folder and would like to speed it up using multithreading. Each image will have a separate thread. How can I go about doing that?

Jess
  • 73
  • 2
  • 12
  • You might find [this article](https://sebastianraschka.com/Articles/2014_multiprocessing.html) useful. It sounds like you want to use the multiprocessing module. – Jack Moody Nov 28 '18 at 21:51
  • Compute-bound processing often does **not** get faster by multithreading—in fact it may get slower. I second @JackMoody's suggestion about looking into using the `multiprocessing` module instead. – martineau Nov 28 '18 at 22:33

3 Answers3

1

I think like others have said you probably want to run your code in parallel which is accomplished with multiprocessing and not multithreading in python. The easiest way to do this probably using multiproccessing.Pool.map. All you have to do is define a function that processes one file taking the file name as the argument. Then, pass a list of all the files you would like to process to the pool.map function with the processing function. The return of Pool.map will be a list of the results:

from multiprocessing import Pool as ProcessPool
import os

def image_processor(image):
    # do stuff
    return results

if __name__ == "__main__":
    desired_file_list = [file_name for file_name in os.listdir("my_directory_path") if file_name.endswith(".jpg")]

    with ProcessPool(processes=8) as pool:
        results = pool.map(image_processor, desired_file_list)

    print(results)

The processes keyword argument controls the number processes that are spawned.

Aaron Arima
  • 164
  • 1
  • 11
0

You could create a class that extends the threading.Thread class and then override the run to perform the task you want to perform if the condition is met.

Then get all images with listdir and iterate over it, assigning a new thread to each image. Finally, start each thread. Below is a sample code of the above description:

import threading
import os

class FileThread(threading.Thread):

    def __init__(self, image):
        threading.Thread.__init__(self)
        self.image = image

    def run(self):
        if image.endswith('.jpg'):
            # Do stuff

# List that will hold all threads.
threadList = []
# List that will hold all images.
images = os.listdir(imagePath)
# Assign each image to a thread.
for image in images:
    threadList.append(FileThread(image))
# Start threads.
for thread in threadList:
    thread.start()

Another way is to use the multiprocessing module and assign each image to a process:

import multiprocessing as mp
import os

# The function that will apply to every image.
def imageFunc(image):
    if image.endsWith(".jpg"):
        # Do something

# An output queue that will hold the results.
output = mp.Queue()

# A list of processes that will perform the 'imageFunc' on each image.
processes = [mp.Process(target=imageFunc, args=(image)) for image in os.listdir(imagePath)]

# Starting all the processes...
for p in processes:
    p.start()

# ...and wait for them to finish.
for p in processes:
    p.join()

# Finally, retrieve the results from the above processes.
result = [output.get() for p in processes]
Vasilis G.
  • 7,556
  • 4
  • 19
  • 29
0

I am thinking something like this:

#! /usr/bin/python3
import os
from multiprocessing import Process

def do_stuff(*args):
    print(*args)

if __name__ == '__main__':
    processes = []
    for f in os.listdir('.'):
        if f[-3:] == 'jpg':
            p = Process(target=do_stuff, args=[f])
            p.start()
            processes.append(p)
    for p in processes:
        p.join()

Just be careful... If you were to do args=f instead of args=[f] you'll get the wrong results

EDIT: To pass in additional args use a tuple but drop the []:

import os
from multiprocessing import Process

def do_stuff(*args):
    print(*args)

if __name__ == '__main__':
    processes = []
    for f in os.listdir('.'):
        if f[-3:] == 'jpg':
            p = Process(target=do_stuff, args=(f, "hello"))
            p.start()
            processes.append(p)
    for p in processes:
        p.join()
  • Thank you! This is excactly what I wanted – Jess Nov 30 '18 at 21:28
  • Also, what if I have multiple arguments, besides the image? The function I am applying to each image takes in several values. I just put it inside the args list? – Jess Dec 03 '18 at 18:39
  • I once passed args in as a tuple... Possibly args=([f], other_arg) –  Dec 03 '18 at 19:42
  • Yup just tried that and it worked.. drop the [f] and just use f... so here's the tuple I used: p = Process(target=do_stuff, args=(f, "hello")) –  Dec 03 '18 at 21:09