Suppose I have this class:
class Foo:
def __init__(self):
self.task1_dict = {}
self.task2_dict = {}
def task1(self):
for i in range(10000000):
# update self.task1_dict
def task2(self):
for i in range(10000000):
# update self.task2_dict
def run(self):
self.task1()
self.task2()
Task 1 and task 2 are both CPU intensive tasks and are non-IO. They are also independent so you can assume that running them concurrently is thread safe.
For now, my class is running the tasks sequentially and I want to change it so the tasks are run in parallel in multiple threads. I'm using the ThreadPoolExecutor from the concurrent.future
package.
class Foo:
...
def run(self):
with ThreadPoolExecutor() as executor:
executor.submit(self.task1)
executor.submit(self.task2)
The problem is when I call the run
method the run time does not decrease at all and even slightly increases compared to the sequential version. I'm guessing that this is because of the GIL allowing only one thread to run at a time. Is there any way that I can parallelise this program? Maybe a way to overcome the GIL and run the 2 methods on 2 threads? I have considered switching to ProcessPoolExecutor
, but I cannot call the methods since class methods are not picklable. Also if I use multiprocessing, Python will create multiple instances of Foo
and self.task1_dict
and self.task2_dict
would not be updated accordingly.