1

Suppose I have this class:

class Foo:
    def __init__(self):
        self.task1_dict = {}
        self.task2_dict = {}

    def task1(self):
        for i in range(10000000):
            # update self.task1_dict
    
    def task2(self):
        for i in range(10000000):
            # update self.task2_dict

    def run(self):
        self.task1()
        self.task2()

Task 1 and task 2 are both CPU intensive tasks and are non-IO. They are also independent so you can assume that running them concurrently is thread safe.

For now, my class is running the tasks sequentially and I want to change it so the tasks are run in parallel in multiple threads. I'm using the ThreadPoolExecutor from the concurrent.future package.

class Foo:
    ...
    def run(self):
        with ThreadPoolExecutor() as executor:
            executor.submit(self.task1)
            executor.submit(self.task2)

The problem is when I call the run method the run time does not decrease at all and even slightly increases compared to the sequential version. I'm guessing that this is because of the GIL allowing only one thread to run at a time. Is there any way that I can parallelise this program? Maybe a way to overcome the GIL and run the 2 methods on 2 threads? I have considered switching to ProcessPoolExecutor, but I cannot call the methods since class methods are not picklable. Also if I use multiprocessing, Python will create multiple instances of Foo and self.task1_dict and self.task2_dict would not be updated accordingly.

Mike Pham
  • 437
  • 6
  • 17

1 Answers1

0

You can use multiprocessing shared memory as explained here

alex_noname
  • 26,459
  • 5
  • 69
  • 86