1

I am writing a pipeline to slides pictures into 256 * 256, each of those 256 * 256 will be processed with image operation like right flipping, left flipping, elastic distortion, gamma correction, etc. The operations itself are not implemented by me but Numpy, Skiimage or OpenCV, so the problem can not be the operations themselves.

My idea is to create a thread pool of 24 threads, each of them will get an initial amount of images which they should process independently of each other, after the processing I will collect the result and return them back. However my code doesn't seem to utilize the CPU power very well.

The implementation of a single thread.

class ImageWorker(Thread):
    def __init__(self):
        Thread.__init__(self)
        self.tasks = []
        self.result = []
        self.pipeline = get_pipeline()

    def add_task(self, task):
        self.tasks.append(task)

    def run(self):
        for _ in range(len(self.tasks)):
            task = self.tasks.pop(0)
            for p in self.pipeline:
                result = p.do(task)
                self.result.append(result)

The implementation of a thread pool

class ImageWorkerPool:
    def __init__(self, num_threads):
        self.workers = []
        self.work_index = 0
        for _ in range(num_threads):
            self.workers.append(ImageWorker())

    def add_task(self, task):
        self.workers[self.work_index].add_task(task)
        self.work_index += 1
        self.work_index = self.work_index % len(self.workers)
        assert self.work_index < len(self.workers)

    def start(self):
        for worker in self.workers:
            worker.start()

    def complete_and_return_result(self):
        for worker in self.workers:
            worker.join()
        result = []
        for worker in self.workers:
            result.extend(worker.result)
        return result

And this is how I create and populate a thread pool.

    threadpool = ImageWorkerPool(num_threads=24)
    for _ in tqdm(range(len(images)), desc="Augmentation"):
        task = tasks.pop(0)
        threadpool.add_task(task)

    threadpool.start()
    result = threadpool.complete_and_return_result()

I have a very beefy CPU with 24 Threads, but they are mostly utilized at 10% most. What is the problem?

enter image description here

Edited: After changing from multithreading to multiprocessing, this is how the performance looks like. The code finished after 20 seconds in comparison to 15 minutes with multithreading. Thanks, @AMC and @quamrana

enter image description here

curiouscupcake
  • 1,157
  • 13
  • 28
  • Why are you not using the builtin `ThreadPoolExecutor`: https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor – rdas Mar 10 '20 at 21:40
  • I am not very familiar with multithreading in python, will there be any performance boost with ThreadPoolExecutor? – curiouscupcake Mar 10 '20 at 21:42
  • Have you considered the fact that your app is limited by the I/O and not by the CPU power ? – Emmanuel BERNAT Mar 10 '20 at 21:43
  • All images will be loaded into the RAM before being processed and are stored into a list before being distributed into threads. So I am pretty sure IO is no the bottleneck. – curiouscupcake Mar 10 '20 at 21:44
  • Why would you write your own thread pool if there is already one provided? And it's fair to say that ThreadPoolExecutor will be more stable than your threadpool – rdas Mar 10 '20 at 21:45
  • @rdas I will definitively take a look at ThreadPoolExecutor – curiouscupcake Mar 10 '20 at 21:46
  • 2
    _However my code doesn't seem to utilize the CPU power very well._ That's because you're using threading, no? Why not use multiprocessing instead? – AMC Mar 10 '20 at 21:49
  • 3
    If cpu is the bottle-neck, then Threads are not the way. Consider Multiprocessing. – quamrana Mar 10 '20 at 21:50
  • @AMC This might sound like a stupid question but why is multiprocessing more performant than multithreading? – curiouscupcake Mar 10 '20 at 21:52
  • 1
    @LongNguyen _This might sound like a stupid question but why is multiprocessing more performant than multithreading?_ Don't worry, it's not stupid! I wouldn't say that one has better performance than the other in general, they're not really meant to be used for the same tasks so it's apples to oranges. There are many solid resources on the subject, a popular question here on SO is https://stackoverflow.com/questions/3044580/multiprocessing-vs-threading-python. – AMC Mar 10 '20 at 21:54
  • 3
    Threading suffers the GIL, whereas Multiprocessing uses real processes which can each run simultaneously on cpu cores. – quamrana Mar 10 '20 at 21:54

1 Answers1

1

This is well explained in many articles. The main culprit is the GIL (Global interpreter lock)

Very shortly explained. even with multiple CPUs and threads only one python byte code can be executed at one time, as execution of python bytecode uses the GIL (a mutext) Threading in python makes only sense if you use python modules written in C, that release the GIL or if most threads are suspended (waiting for IO).

The solution is as others mentioned to use the module multiprocessing or to use another language.

I suggest to search within SO for follwing keywords to get some insight:

python multithreading gil performance

gelonida
  • 5,327
  • 2
  • 23
  • 41