I need to run a same function with different input parameters, it will cost a very long time but luckily I know it only need 1 CPU core, so I decide to run them parallelly with multiprocessor package in python. Assuming I have 4 cores in my CPU and I have 8 different parameters to try, and therefore I think the time cost of the following code will be about 2 times as the time cost to run a single job:
from multiprocessing import Pool
def a_hard_work(parameter):
#do some time-cost work with parameter as input
class IterableAdapter:
"""https://stackoverflow.com/a/39564774"""
def __init__(self, iterable_factory, length=None):
self.iterable_factory = iterable_factory
self.length = length
def __iter__(self):
return iter(self.iterable_factory())
def loader(param_set):
def loader():
for param in param_set:
yield param
return IterableAdapter( lambda: loader() )
def worker(parameter):
a_hard_work(parameter)
if __name__ == "__main__":
param_set = [1,2,3,4,5,6,7,8]
tasks = loader(param_set)
with Pool() as p:
for i in p.imap_unordered(worker, tasks):
try:
pass
except Exception as e:
print(e)
However it cost much longer time than I excepted, so I checked the usage of my CPU and I found only 1 CPU core was being used during this script was running, all other 3 cores had no load.
Then I changed my code to the following:(which I thought should be same):
from multiprocessing import Pool
def a_hard_work(parameter):
#do some time-cost work with parameter as input
def loader(param_set):
for param in param_set:
yield param
def worker(parameter):
a_hard_work(parameter)
if __name__ == "__main__":
param_set = [1,2,3,4,5,6,7,8]
tasks = loader(param_set)
with Pool() as p:
for i in p.imap_unordered(worker, tasks):
try:
pass
except Exception as e:
print(e)
This time the script worked as I excepted, but what is the difference between two ways I generate jobs?