0

I have a pipeline with several steps and for some of the steps I would like to use multiprocessing module to parallel the calculation.

My main code is in one file, let's call it pipeline.py. The code for each calculation in stored in separate files, let's call them calculation01.py, calculation02.py, calculation03.py.

#pipeline.py
import calculation01
import calculation02
import calculation03

for folder in all_taks:
    calculation01.do_calculation_1(folder, a, b, c)
    calculation02.do_calculation_2(folder, d, e, f)
    calculation03.do_calculation_3(folder, g, h, i)

For calculation01 it is possible to parallelise the calculation. If I understand correctly, on Windows the code for the parallelisation must be inside

if __name__ == '__main__': 

and not inside a function.

Now my questions is how I could pass the arguments to the file calculation01.py? Is this possible?

#calculation01.py
def do_calculation_1(args):
    #do heavy calculation
    folder, a, b, c = args

if __name__ == '__main__':
    b, c = 1, 1
    ll = [([1, 2, 3, 4], a, b, c),
      ([5, 6, 7, 8], a, b, c),
      ([9, 10, 11, 12], a, b, c)]
    pool = multiprocessing .Pool(4)
    result = pool.map(do_calculation_1, ll)
    pool.terminate()
coder
  • 12,832
  • 5
  • 39
  • 53
honeymoon
  • 2,400
  • 5
  • 34
  • 43
  • What makes you think you need to put that code inside `__name__ == '__main__'`? If you do that it will never be called at all when you import calculation01. – Daniel Roseman Oct 07 '16 at 12:10
  • I thought because of this thread: http://stackoverflow.com/questions/20222534/python-multiprocessing-on-windows-if-name-main – honeymoon Oct 07 '16 at 12:23
  • If I try to put the multiprocessing code into a function, the program does not work as expected and I am getting the following error: Attempt to start a new process before the current process has finished its bootstrapping phase. This probably means that you are on Windows and you have forgotten to use the proper idiom in the main module: if __name__ == '__main__': freeze_support() – honeymoon Oct 07 '16 at 12:28
  • Well, that says you need to call `freeze_support()` inside that block. Did you do that? – Daniel Roseman Oct 07 '16 at 13:07
  • No, I just do pool = multiprocessing .Pool(4) result = pool.map(do_calculation_1, ll) pool.terminate() – honeymoon Oct 07 '16 at 13:31
  • If I understand it correctly from the docs I do need to run the code __name__ == '__main__': ""For an explanation of why (on Windows) the if __name__ == '__main__' part is necessary, see Programming guidelines." – honeymoon Oct 07 '16 at 15:39

0 Answers0