Python parallel processing - Different behaviors between Linux and Windows

Question

I was trying to make my code parallel and I run into a strange thing that I am not able to explain.

Let me define the context. I have a really heavy computation to do, reading multiple files, performing machine learning analysis on it, a lot of math is involved. My code runs normally on Windows and Linux when is sequential, but when I try to use multiprocessing everything breaks. Below there is an example, that I developed first on Windows:

from multiprocessing.dummy import Pool as ThreadPool 

def ppp(element):
    window,day = element
    print(window,day)
    time.sleep(5)
    return

if __name__ == '__main__'    
    #%% Reading datasets
    print('START')
    start_time = current_milli_time()
    tree = pd.read_csv('datan\\days.csv')
    days = list(tree.columns)
    # to be able to run this code uncomment the following line and comment the previous two
    # days = ['0808', '0810', '0812', '0813', '0814', '0817', '0818', '0827', '0828', '0829']
    windows = [1000]
    processes_args = list(itertools.product(windows, days))

    pool = ThreadPool(8) 
    results = pool.map_async(ppp, processes_args)
    pool.close() 
    pool.join() 
    print('END', current_milli_time()-start_time, 'ms')

When I run this code on Windows the output looks like that:

START
100010001000 1000 1000100010001000      081008120808
08130814
0818
082708171000
1000    
  08290828

END 5036 ms

A messy set of prints in 125 ms. Same behavior on Linux too. However, I noticed that if I apply this method on Linux, and I look into 'htop', what I am seeing is a set of threads that are randomly picked for execution, but they never execute in parallel. Thus, after some google searches I came up with this new code:

from multiprocessing import Pool as ProcessPool

def ppp(element):
    window,day = element
    print(window,day)
    time.sleep(5)
    return

if __name__ == '__main__':
    #%% Reading datasets
    print('START')
    start_time = current_milli_time()
    tree = pd.read_csv('datan\\days.csv')
    days = list(tree.columns)
    # to be able to run this code uncomment the following line and comment the previous two
    # days = ['0808', '0810', '0812', '0813', '0814', '0817', '0818', '0827', '0828', '0829']
    windows = [1000]
    processes_args = list(itertools.product(windows, days))

    pool = ProcessPool(8) 
    results = pool.map_async(ppp, processes_args)
    pool.close() 
    pool.join() 
    print('END', current_milli_time()-start_time, 'ms')

As you can see, I changed the import statement, which basically creates a Process pool instead of a Thread pool. That solves the problem on Linux, in fact in the real scenario, I have 8 processors running at 100% with 8 processes running in the system. The output looks like the one before. However, when I use this code on windows, more than 10 seconds are needed for the entire running, moreover, I am not getting any of the prints of ppp, just the ones of the main.

I really tried to search for an explanation, but I am not understanding why that happens. For example here: Python multiprocessing Pool strange behavior in Windows, they talk about safe code on windows and the answer suggests to move to Threading, that, as a side effect, will make the code not parallel, but concurrent. Here another example: Python multiprocessing linux windows difference. All these questions describe fork() and spawn processes, but I personally think that the point of my question is not that. Python documentation still explains that windows does not have a fork() method (https://docs.python.org/2/library/multiprocessing.html#programming-guidelines).

In conclusion, right now I am convincing that I cannot do parallel processing in Windows, but I think that what I am entailing from all these discussions is wrong. Thus, my question should be: is it possible to run processes or threads in parallel (on different CPUs) in Windows?

EDIT: add name == main in both the examples

EDIT2: to be able to run the code this function and these imports are needed:

import time
import itertools    
current_milli_time = lambda: int(round(time.time() * 1000))

python will not utilize multi-core with multi-thread because of GIL. — georgexsh, Nov 02 '18 at 08:52

score 3 · Accepted Answer · answered Nov 08 '18 at 20:20

You can do parallel processing under Windows (I have a script running now doing heavy computations and using 100% of all 8 cores) but the way it works is by creating parallel processes, not threads (which won't work because of the GIL except for I/O operations). A few important points:

you need to use concurrent.futures.ProcessPoolExecutor() (note it's the process pool not the thread pool). See https://docs.python.org/3/library/concurrent.futures.html. In a nutshell, the way it works is that you put the code you want to parallelize in a function and then you call executor.map() which will done the split.
note that on Windows each parallel process will start from scratch, so you'll probably need to use if __name__ == '__main__:' in a few places to distinguish between what you do in the main process vs the others. The data that you load in the main script will be replicated to the child processes so it has to be serializable (pickl'able in Python lingo).
in order to efficiently use the core, avoid writing data to objects shared across all processes (eg. a progress counter or a common data structure). Otherwise the synchronization between the processes will kill performance. So monitor the execution from the task manager.

georgexsh · Answer 2 · 2018-11-02T09:01:40.030

1

under windows, python use pickle/unpickle to mimic fork in multiprocessing module, when doing unpickle, the module get reimported, any code in global scope execute again, the docs stated:

Instead one should protect the “entry point” of the program by using if name == 'main'

besides, you should cosume the AsyncResult returned by pool.map_async, or simply use pool.map.

edited Nov 02 '18 at 09:01

answered Nov 02 '18 at 08:52

georgexsh

15,984
2
37
62

My question remains if I add name == 'main', the output is the following one: START END 13070 ms. Why is slower than Threads, why there are no prints? – Guido Muscioni Nov 02 '18 at 08:57
Same thing, nothing change. The real code has some progressbars that tell me the state of the computation everything is fine on Linux. Still also with `map` slower than Threads. – Guido Muscioni Nov 02 '18 at 09:04
The example in the question, let me just modify the days variable – Guido Muscioni Nov 02 '18 at 09:06
the example worker function didn't do any real computation, you are basically timing the overhead of `multiprossing.Pool`. – georgexsh Nov 02 '18 at 09:08
That can be a reason, still, I am not getting the print during the execution. – Guido Muscioni Nov 02 '18 at 09:10
I did an example with a `time.sleep(5)`, threading takes 5 seconds to execute, processing takes 10 seconds and no prints. – Guido Muscioni Nov 02 '18 at 09:13
@GuidoMuscioni can not reprocess that no outputs issue on a windowsxp/python2.7,python3.4 virtual machine. – georgexsh Nov 02 '18 at 09:15
I am using python 3.6 on Windows 10 the code is the one in the examples. – Guido Muscioni Nov 02 '18 at 09:16
@GuidoMuscioni i don't think the code in the example is runnable under windows because there is no "if name == main" protection. – georgexsh Nov 02 '18 at 09:17
I added that after your suggestions. – Guido Muscioni Nov 02 '18 at 09:18
how do you execute this script? in cmd.exe? – georgexsh Nov 02 '18 at 09:47
I am using spyder to execute it, I believe that you should be able to do that also in prompt – Guido Muscioni Nov 02 '18 at 09:57
@GuidoMuscioni that's must be spyder's issue, it redirects outputs in order to display in ide, but the child processes fail to inherit that redirection in windows. try run your script in cli. – georgexsh Nov 02 '18 at 10:02
That makes sense, however still the Processing takes more than Threading. The opposite happens for Linux, and I think my question is about that, why this different behavior with the same code happens. Do you think is still overhead? – Guido Muscioni Nov 02 '18 at 10:04
@GuidoMuscioni of course, it is overhead, `sleep` does't busy your cpu. and there is no number proof in your question. – georgexsh Nov 02 '18 at 10:46
@GuidoMuscioni try some CPU intensive work load like `hashlib.md5(b"s"*10**10)`. – georgexsh Nov 05 '18 at 07:06

Python parallel processing - Different behaviors between Linux and Windows

2 Answers2