Python multiprocessing with Pool - the main process takes forever

Question

I am trying to understand how multiprocessing works with Python. Here's my test code:

import numpy as np
import multiprocessing
import time

def worker(a):
    for i in range(len(a)):
        for j in arr2:
            a[i] = a[i]*j
    return len(a)

arr2 = np.random.rand(10000).tolist()

if __name__ == '__main__':
    multiprocessing.freeze_support()
    cores = multiprocessing.cpu_count()
    arr1 = np.random.rand(1000000).tolist()
    tmp = time.time()
    pool = multiprocessing.Pool(processes=cores)
    result = pool.map(worker, [arr1], chunksize=1000000/(cores-1))
    print "mp time", time.time()-tmp

I have 8 cores. It usually ends up with 7 processes using only ~3% of the CPU for about a second, and the last process uses ~1/8 of the CPU for forever...(it has been running for about 15 minutes)

I understand that the interprocess communication usually bounds the complexity of parallel programming, but does it usually take this long? What else could cause the last process to take forever?

This thread: Python multiprocessing never joins seems to address a similar issue but it doesn't solve the problem with Pool.

`[arr1]` - you are only doing one job with the entire dataset. — tdelaney, May 03 '17 at 23:30
@tdelaney shouldn't pool.map cut arr1 into chunks? The parallel code works and works faster than single core code with arr1 of size 10000. — Fenwick, May 03 '17 at 23:36
You have to do the splitting yourself. Are you on Windows or a unix-like system? There is a faster way for unixy systems. — tdelaney, May 03 '17 at 23:38
@tdelaney Oh I just realized that arr1 is automatically iterated by Pool in each worker() call. What is the splitting technique you are talking about? Thanks! — Fenwick, May 03 '17 at 23:42

tdelaney · Accepted Answer · 2017-05-04T03:57:34.190

It looks like you want to divide the work into chunks. You can use the range function to partition the data. On Linux, forked processes get a copy-on-write view of the parent memory so you can just pass down the indexes you want to work on. On Windows, no such luck. You need to pass in each sublist. This program should do it

import numpy as np
import multiprocessing
import time
import platform

def worker(a):
    if platform.system() == "Linux":
        # on linux we passed in start:len
        start, length = a
        a = arr1[start:length]
    for i in range(len(a)):
        for j in arr2:
            a[i] = a[i]*j
    return len(a)

arr2 = np.random.rand(10000).tolist()

if __name__ == '__main__':
    multiprocessing.freeze_support()
    cores = multiprocessing.cpu_count()
    arr1 = np.random.rand(1000000).tolist()
    tmp = time.time()
    pool = multiprocessing.Pool(processes=cores)
    chunk = (len(arr1)+cores-1)//cores
    # on Windows, pass the sublist, on linux just the indexes and let the
    # worker split from the view of parent memory space
    if platform.system() == "Linux":
        seq = [(i, i+chunk) for i in range(0, len(arr1), chunk)]
    else:
        seq = [arr1[i:i+chunk] for i in range(0, len(arr1), chunk)]
    result = pool.map(worker, seq, chunksize=1)
    print "mp time", time.time()-tmp

score 0 · Answer 2 · answered May 04 '17 at 02:01

You point is here:

pool.map will automatically iterate the object which is [arr1] in your program. Please notice that the object is [arr1] but not arr1, that means the length of object you pass to pool.map is only one.

I think the simplest solution is replace [arr1] with arr1.

Python multiprocessing with Pool - the main process takes forever

2 Answers2