1

Trying to parallelize some code in Python. Both methods, apply and map from the multiprocessing library, hang in perpetuity when executing the following code.

import multiprocessing as mp
import numpy as np

#the following function will be parallelized. 
def howmany_within_range(row, minimum, maximum):
    """Returns how many numbers lie within `maximum` and `minimum` in a given `row`"""
    count = 0
    for n in row:
        if minimum <= n <= maximum:
            count = count + 1
    return count

# Step 1: Create data 
np.random.RandomState(100)
arr = np.random.randint(0, 10, size=[200000, 5])
data = arr.tolist()
data[:5]

# Step 2: Init multiprocessing.Pool()
pool = mp.Pool(mp.cpu_count())

# Step 3: `pool.apply` the `howmany_within_range()`
results = [pool.apply(howmany_within_range, args=(row, 4, 8)) for row in data]

# Step 4: close
pool.close()    

print(results[:10])

The other method pool.map also hangs:

# Redefine, with only 1 mandatory argument.
def howmany_within_range_rowonly(row, minimum=4, maximum=8):
    count = 0
    for n in row:
        if minimum <= n <= maximum:
            count = count + 1
    return count

pool = mp.Pool(mp.cpu_count())

results = pool.map(howmany_within_range_rowonly, [row for row in data])

pool.close()

print(results[:10])

What is wrong?

Ps. Working on Python 3.8.11 (Jupyter Notebook 6.1.4)

Timus
  • 10,974
  • 5
  • 14
  • 28
Alex
  • 305
  • 4
  • 10
  • Both work fine for me. But your first version isn't multiprocessing: You need to use `.apply_async` for that (and don't forget to collect the results via `.get()` on the list elements). Anyways, I recommend to use the second version, maybe switch to `.starmap` to use more function arguments. – Timus Oct 11 '21 at 09:43
  • Have a look [here](https://stackoverflow.com/questions/47313732/jupyter-notebook-never-finishes-processing-using-multiprocessing-python-3): This seems similiar to your problem? The issue seems to be the Jupyter notebook. – Timus Oct 11 '21 at 10:45
  • So it runs fine natively in python with ```if __name__="__main__":``` to stop ```mp.Pool()``` throwing an error. However ```size=[200000, 5]``` is huge. When I cut this to 200 it ran fine. – jwal Oct 11 '21 at 17:48
  • Multiprocessing is difficult to get working with IPython (Jupyter) on Windows. This is because the newly spawned child process must `import` the main file to have access to the functions and values not explicitly passed as arguments to the target function. the "interactive" part of **I**Python means there's no "main" file to be imported always – Aaron Oct 12 '21 at 13:57

0 Answers0