Multiprocessing pool with numpy functions

Question

I have a i5-8600k with 6 cores and am running a windows 10 computer. I am trying to perform multi processing with 2 numpy functions. I have made an issue before hand but I have not been successful as to making running the issue: issue, the code down below is from the answer to that issue. I am trying to run func1() and func2() at the same time however, when I run the code below it keeps on running forever.

import multiprocessing as mp
import numpy as np
num_cores = mp.cpu_count()

Numbers = np.array([1,2,3,4,5,6,7,8,9,10,11,12])
def func1():
     Solution_1 = Numbers + 10
     return Solution_1
def func2():
     Solution_2 = Numbers * 10
     return Solution_2

# Getting ready my cores, I left one aside
pool = mp.Pool(num_cores-1)
# This is to use all functions easily
functions = [func1, func2]
# This is to store the results
solutions = []
for function in functions:
    solutions.append(pool.apply(function, ()))

on Linux Mint with very old procesor it runs in less then 0.03 second. But I run it normally `python script.py`, not in Jupyter Notebook. — furas, Feb 11 '21 at 02:58
Is there a reason why it might not run on jupyter notebook it uses pythons kernel? — tony selcuk, Feb 11 '21 at 06:21
Yes, multiprocessing requires importing the `__main__` module which is not possible with an interactive session: https://stackoverflow.com/a/23641560/3220135 — Aaron, Feb 11 '21 at 06:27
interactive is great for prototyping, and exploratory analysis, but not for actually running code you've built — Aaron, Feb 11 '21 at 06:29
now I tested it in Jupiter Notebook and it works in 0.05 second. BTW: In both versions I had to add `print(solutions)` to see results. — furas, Feb 11 '21 at 11:04

Booboo · Accepted Answer · 2021-02-12T11:03:26.613

There are several issues with the code. First, if you want to run this under Jupyter Notebook in Windows then you need to put your worker functions func1 and func2 in an external module, for example, workers.py and import them and that means you now need to either pass the Numbers array as an argument to the workers or initialize static storage of each process with the array when you initialize the pool. We will you the second method with a function called init_pool, which also has to be imported if we are running under Notebook:

workers.py

def func1():
     Solution_1 = Numbers + 10
     return Solution_1

def func2():
     Solution_2 = Numbers * 10
     return Solution_2

def init_pool(n_array):
    global Numbers
    Numbers = n_array

The second issue is that when running under Windows, the code that creates sub-processes or a multiprocessing pool must be within a block that is governed by a conditional if __name__ == '__main__':. Third, it is wasteful to create a pool size greater than 2 if you are only trying to run two parallel "jobs." And fourth, and I think finally, you are using the wrong pool method. apply will block until the "job" submitted (i.e. the one processed by func1) completes and so you are not achieving any degree of parallelism at all. You should be using apply_async.

import multiprocessing as mp
import numpy as np
from workers import func1, func2, init_pool


if __name__ == '__main__':
    #num_cores = mp.cpu_count()
    Numbers = np.array([1,2,3,4,5,6,7,8,9,10,11,12])
    pool = mp.Pool(2, initializer=init_pool, initargs=(Numbers,)) # more than 2 is wasteful
    # This is to use all functions easily
    functions = [func1, func2]
    # This is to store the results
    solutions = []
    results = [pool.apply_async(function) for function in functions]
    for result in results:
        solutions.append(result.get()) # wait for completion and get the result
    print(solutions)

Prints:

[array([11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22]), array([ 10,  20,  30,  40,  50,  60,  70,  80,  90, 100, 110, 120])]

Thank you it does work, however I could not make it work with jupyter I am running python 3.7.8 on the jupyter kernel is there a way I could make it work with the program? — tony selcuk, Feb 12 '21 at 06:35
I am not sure what your issue is. I am running Python 3.8.5 under Windows 10 and this worked fine under Jupyter Notebook and Jupyter Lab. I have attached to my answer an image of this. You need to describe in greater detail what "I could not make it work" means. Do you see errors on the Jupyter Notebook console, for example? — Booboo, Feb 12 '21 at 11:05
It shows that that piece of code keeps on running I am not sure as to why. I have updated the code with a snapshot showing that the asterisk is running. — tony selcuk, Feb 12 '21 at 21:29
My answer clearly states that **functions `func1`, `func2` and `init_pool` *must* be placed in an module and imported.** Re-read my answer's description and re-read my code. I placed these in a file `workers.py` in the same directory as `.ipynb` file that contains my cells. If you look at the "console" where you started up Notebook, you will see lots of error messages if you do not do this. — Booboo, Feb 12 '21 at 21:38
I didn't think that was necessary so I ran all the functions in one python file. It works with `script.py` but it doesn't work with jupyter notebook. Which I don't really understand — tony selcuk, Feb 13 '21 at 03:42
There are a lot of things I don't understand, but it doesn't mean they aren't so. See https://stackoverflow.com/questions/47313732/jupyter-notebook-never-finishes-processing-using-multiprocessing-python-3. — Booboo, Feb 13 '21 at 04:11

Multiprocessing pool with numpy functions

1 Answers1