I have a pipeline with several steps and for some of the steps I would like to use multiprocessing module to parallel the calculation.
My main code is in one file, let's call it pipeline.py.
The code for each calculation in stored in separate files, let's call them calculation01.py
, calculation02.py
, calculation03.py
.
#pipeline.py
import calculation01
import calculation02
import calculation03
for folder in all_taks:
calculation01.do_calculation_1(folder, a, b, c)
calculation02.do_calculation_2(folder, d, e, f)
calculation03.do_calculation_3(folder, g, h, i)
For calculation01 it is possible to parallelise the calculation. If I understand correctly, on Windows the code for the parallelisation must be inside
if __name__ == '__main__':
and not inside a function.
Now my questions is how I could pass the arguments to the file calculation01.py
? Is this possible?
#calculation01.py
def do_calculation_1(args):
#do heavy calculation
folder, a, b, c = args
if __name__ == '__main__':
b, c = 1, 1
ll = [([1, 2, 3, 4], a, b, c),
([5, 6, 7, 8], a, b, c),
([9, 10, 11, 12], a, b, c)]
pool = multiprocessing .Pool(4)
result = pool.map(do_calculation_1, ll)
pool.terminate()