Trying to parallelize some code in Python. Both methods, apply and map from the multiprocessing library, hang in perpetuity when executing the following code.
import multiprocessing as mp
import numpy as np
#the following function will be parallelized.
def howmany_within_range(row, minimum, maximum):
"""Returns how many numbers lie within `maximum` and `minimum` in a given `row`"""
count = 0
for n in row:
if minimum <= n <= maximum:
count = count + 1
return count
# Step 1: Create data
np.random.RandomState(100)
arr = np.random.randint(0, 10, size=[200000, 5])
data = arr.tolist()
data[:5]
# Step 2: Init multiprocessing.Pool()
pool = mp.Pool(mp.cpu_count())
# Step 3: `pool.apply` the `howmany_within_range()`
results = [pool.apply(howmany_within_range, args=(row, 4, 8)) for row in data]
# Step 4: close
pool.close()
print(results[:10])
The other method pool.map also hangs:
# Redefine, with only 1 mandatory argument.
def howmany_within_range_rowonly(row, minimum=4, maximum=8):
count = 0
for n in row:
if minimum <= n <= maximum:
count = count + 1
return count
pool = mp.Pool(mp.cpu_count())
results = pool.map(howmany_within_range_rowonly, [row for row in data])
pool.close()
print(results[:10])
What is wrong?
Ps. Working on Python 3.8.11 (Jupyter Notebook 6.1.4)