0

I am a physicist and I am new to Python just recently migrated from MATLAB. I need to implement multiprocessing to perform calculations for a genetic optimization algorithm.

I have managed to make multiprocessing work in single python file using information I found on-line. This is an example code I created to replicate the problem (it doesn't do anything useful).

import numpy as np
import multiprocessing as mp    

def fun1((index, array)):

    lst_1=[]
    for ar_i in array:
        lst_1.append(ar_i+index)

    return np.sum(np.squeeze(lst_1))        

def main(N):
    arr=np.arange(1,N)

    lst=[]
    for elem in arr:        
        lst.append((elem,arr))


    p = mp.Pool(8)
    a=p.map(fun1, lst)    

    return a         

if __name__ == "__main__":
    ans=main(30) 

If I run this from the Python console it gives me the answer. Also from the Python console I can import the function from the Python file and it also works. For example:

import example1
ans2=example1.main(20)

But if I am in another Python file (for example the main genetic algorithm file) and I import the function again and try to run it as:

import example1
ans3=example1.main(10)

I get an error message:

"RuntimeError: 
            Attempt to start a new process before the current process
            has finished its bootstrapping phase.

            This probably means that you are on Windows and you have
            forgotten to use the proper idiom in the main module:

                if __name__ == '__main__':
                    freeze_support()
                    ...

            The "freeze_support()" line can be omitted if the program
            is not going to be frozen to produce a Windows executable."

How can I solve this problem? I searched online but I haven't been able to figure this out. Perhaps I am missing something obvious. Thank you.

  • I don't get the same error as you, but did you try wrapping your main generic algorithm code in a `if __name__ == 'main'` condition as the message says? The purpose of this is so that the code that creates the processes isn't run many times each time the file is imported. See here for details: http://stackoverflow.com/a/18205006/3731982 and the documentation:https://docs.python.org/2/library/multiprocessing.html#windows – Steve Jan 17 '16 at 05:12
  • Thanks Steve that worked. So I've used another if __name__ == 'main': statement in the second Python file and it worked fine. What Python version are you using? I am using 2.7.11 and I am wondering if I move to Python 3 whether I can avoid using the if __name__ == 'main': statement. Also from what I've read this error doesn't come up when you are using Linux. – user3669158 Jan 17 '16 at 10:40
  • No problem, I'm using Python 2.7.9, but I just remembered I was running it in a Linux VM, that's why I didn't get the error. So it's correct that the error occurs on Windows and not Linux. As for Python, I would stick with Python 2 and not Python 3 for now since it's supported much better in the scientific community. I would also keep the `__name__ == 'main'` in both files because that's the proper way to write Python code that isn't part of a package. It is very common to include similar code to that in almost all languages except Matlab. – Steve Jan 17 '16 at 18:20
  • Also, since you're using similar data between processes, you may want to look into Python `threading.Thread` instead, which will allow you to share memory between your computation threads. This will depend on your requirements of course, but it may be worth looking into if you haven't already. – Steve Jan 17 '16 at 18:23
  • 1
    Threading however does not run more than one threat at the same time – Davoud Taghawi-Nejad Apr 06 '16 at 16:55

0 Answers0