what's different between these two simple python codes? (one works and the other doesn't work)

Question

import os
import numpy as np
import time
from multiprocessing import Process, current_process


def doubler(number):
    result = number * 2
    proc_name = current_process().name
    print('{0} doubled to {1} by: {2}'.format( number, result, proc_name))

def solve_inverse(np_ndarray_square_matrix):
    inverse_matrix=np.linalg.inv(np_ndarray_square_matrix)
    proc_name = current_process().name
    print('process name :',proc_name,'       ',inverse_matrix)


if __name__=='__main__':
    start_time=time.time()

    dim=100
    thread_num=10

    matrice = [np.random.normal(loc=1.0 , scale=5.0 , size=(dim,dim)) for _ in range(thread_num)]
    procs = []

    for index, matrix in enumerate(matrice):
        proc = Process(target=solve_inverse , args=(matrix,))
        procs.append(proc)
        proc.start()

    for proc in procs:
        proc.join()

    end_time=time.time()

    print('time length :',end_time-start_time)

the code above is a simple python code which computes inverse of randomly sampled matrices with multiprocessing. However , the following code doesn't work

import os
import numpy as np
import time
from multiprocessing import Process, current_process


def doubler(number):
    result = number * 2
    proc_name = current_process().name
    print('{0} doubled to {1} by: {2}'.format( number, result, proc_name))

def solve_inverse(np_ndarray_square_matrix):
    inverse_matrix=np.linalg.inv(np_ndarray_square_matrix)
    proc_name = current_process().name
    print('process name :',proc_name,'       ',inverse_matrix)


start_time=time.time()

dim=3
thread_num=10

matrice = [np.random.normal(loc=1.0 , scale=5.0 , size=(dim,dim)) for _ in range(thread_num)]
procs = []

for index, matrix in enumerate(matrice):
    proc = Process(target=solve_inverse , args=(matrix,))
    procs.append(proc)
    proc.start()

for proc in procs:
    proc.join()

end_time=time.time()

print('time length :',end_time-start_time)

the only difference is whether there is if __name__=='__main__: or not. As far as I know, if __name__=='__main__: recognize if this module is imported by the other module or this module run itself. So I thought actually there's no difference between two codes to decide what computer should do. what's wrong?

And one more question! It seems that the first code doesn't do 'multi-processing' I mean, multi processing doesn't seem to work at the same time when I watch the time the code takes from start to end. when I increased multiprocessing number, the time increased linearly w.r.t. the number of multiprocessing. I don't know what happend! please help me! — Eric, Aug 08 '17 at 12:10
These answers will help. https://stackoverflow.com/a/29697273/4045933, https://stackoverflow.com/a/18205006/4045933 — SunilThorat, Aug 08 '17 at 12:20

score 1 · Answer 1 · edited Jun 20 '20 at 09:12

1

if __name__ == '__main__': is required for the multiprocessing module to work. See Programming Guidelines, specifically:

Safe importing of main module

Make sure that the main module can be safely imported by a new Python interpreter without causing unintended side effects (such a starting a new process).

edited Jun 20 '20 at 09:12

Community

1
1

answered Aug 08 '17 at 12:21

Mark Tolonen

166,664
26
169
251

score 1 · Accepted Answer · answered Aug 08 '17 at 12:34

Specifically, every time you make a child process that process starts by importing your script (much as you might import numpy as np).

If you don't block off the parts of your script that generates new processes under if __name__='__main__':, all those sub-process would spawn their own sub-sub-processes whenever they import the script, which would spawn their own sub-sub-sub-processes and so on until you have . . . well . . .

a Stack Overflow. And nobody likes them.

what's different between these two simple python codes? (one works and the other doesn't work)

2 Answers2