Multiprocessing on Python 3 Jupyter

Question

I come here because I have an issue with my Jupiter's Python3 notebook. I need to create a function that uses the multiprocessing library. Before to implement it, I make some tests. I found a looooot of different examples but the issue is everytime the same : my code is executed but nothing happens in the notebook's interface :

enter image description here

The code i try to run on jupyter is this one :

import os

from multiprocessing import Process, current_process


def doubler(number):
    """
    A doubling function that can be used by a process
    """
    result = number * 2
    proc_name = current_process().name
    print('{0} doubled to {1} by: {2}'.format(
        number, result, proc_name))
    return result


if __name__ == '__main__':
    numbers = [5, 10, 15, 20, 25]
    procs = []
    proc = Process(target=doubler, args=(5,))

    for index, number in enumerate(numbers):
        proc = Process(target=doubler, args=(number,))
        proc2 = Process(target=doubler, args=(number,))
        procs.append(proc)
        procs.append(proc2)
        proc.start()
        proc2.start()

    proc = Process(target=doubler, name='Test', args=(2,))
    proc.start()
    procs.append(proc)

    for proc in procs:
        proc.join()

It's OK when I just run my code without Jupyter but with the command "python my_progrem.py" and I can see the logs : enter image description here

Is there, for my example, and in Jupyter, a way to catch the results of my two tasks (proc1 and proc2 which both call thefunction "doubler") in a variable/object that I could use after ? If "yes", how can I do it?

Here is what helped: https://medium.com/@grvsinghal/speed-up-your-python-code-using-multiprocessing-on-windows-and-jupyter-or-ipython-2714b49d6fac Define your main function in separate python module. — user1700890, Jan 31 '23 at 20:33

score 20 · Answer 1 · edited May 19 '22 at 20:00

20

@Konate's answer really helped me. Here is a simplified version using multiprocessing.pool:

import multiprocessing

def double(a):
    return a * 2

def driver_func():
    PROCESSES = 4
    with multiprocessing.Pool(PROCESSES) as pool:
        params = [(1, ), (2, ), (3, ), (4, )]
        results = [pool.apply_async(double, p) for p in params]

        for r in results:
            print('\t', r.get())
driver_func()

edited May 19 '22 at 20:00

Anton Frolov

112
1
9

answered Jul 16 '20 at 13:20

Kamen Tsvetkov

519
5
9

15

@Kamen Tsvetkov, Thanks for sharing your approach. I tried it on my windows machine, it seems that driver_func() just hangs out there without outputting anything – user785099 Jun 04 '21 at 19:15
Thanks for sharing your solution. How did the runtime compare to the non-parallelized version? – kushy Aug 06 '21 at 12:37
3

didn't work on mac + JupyterLab – Anton Frolov May 19 '22 at 08:03
you release that this runs "asynchronously in a single process" not multiple, right? Check the [AsyncResult object docs](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.AsyncResult) where they say this explicitly in the example – CpILL Sep 27 '22 at 01:35
As previously said by user785099, this solution does not work on Windows: the code cell just hangs when executing. The solution for windows is here, simple and effective: https://jupyter-tutorial.readthedocs.io/en/stable/performance/multiprocessing.html – Armando Contestabile Dec 21 '22 at 07:26
@ArmandoContestabile I tried your link on Windows and it did not work. It hangs – user1700890 Jan 31 '23 at 20:24
@user1700890 I copied right now the content of cell #1 of the linked page 2 posts before and it works. My environment is: Windows 10, Python 3.11.1 on virtual environment, VSCode + Jupyter extension. – Armando Contestabile Feb 02 '23 at 14:34
@ArmandoContestabile The reason the code in the link works is that it uses a ThreadPool, rather than a Pool – stuart10 Mar 22 '23 at 11:11

score 7 · Answer 2 · edited Apr 17 '20 at 09:52

I succeed by using multiprocessing.pool. I was inspired by this approach :

def test():
    PROCESSES = 4
    print('Creating pool with %d processes\n' % PROCESSES)

with multiprocessing.Pool(PROCESSES) as pool:
    TASKS = [(mul, (i, 7)) for i in range(10)] + \
            [(plus, (i, 8)) for i in range(10)]

    results = [pool.apply_async(calculate, t) for t in TASKS]
    imap_it = pool.imap(calculatestar, TASKS)
    imap_unordered_it = pool.imap_unordered(calculatestar, TASKS)

    print('Ordered results using pool.apply_async():')
    for r in results:
        print('\t', r.get())
    print()

    print('Ordered results using pool.imap():')
    for x in imap_it:
        print('\t', x)

...etc For more, the code is at : https://docs.python.org/3.4/library/multiprocessing.html?

score 6 · Accepted Answer · answered Nov 05 '18 at 21:33

6

Another way of running multiprocessing jobs in a Jupyter notebook is to use one of the approaches supported by the nbmultitask package.

answered Nov 05 '18 at 21:33

psychemedia

5,690
7
52
84

Armando Contestabile · Answer 4 · 2023-03-24T10:37:34.893

It would be good to clarify some things before to give the answer:

officially, as per the documentation, multiprocessing.Pool does not work on interactive interpreter (such as Jupyter notebooks). See also this answer.
unlike multiprocessing.Pool, multiprocessing.ThreadPool does work also in Jupyter notebooks

To make a generic Pool class working on both classic and interactive python interpreters I have made this:

def is_notebook() -> bool:
    try:
        if "get_ipython" in globals().keys():
            get_ipython = globals()["get_ipython"]
            shell = get_ipython().__class__.__name__
            if shell == "ZMQInteractiveShell":
                return True  # Jupyter notebook or qtconsole
        # elif shell == "TerminalInteractiveShell":
        #   return False  # Terminal running IPython
        #   else:
        return False  # Other type (?)
    except NameError:
        return False  # Probably standard Python interpreter


if is_notebook():
    from multiprocessing.pool import ThreadPool as Pool
    from threading import Lock
else:
    from multiprocessing.pool import Pool
    from multiprocessing import Lock

The following example works on both standard .py and jupyter .ipynb files.

#########################################
# Diversified import based on execution environment (notebook/standard interpreter)
#########################################
def is_notebook() -> bool:
    try:
        if "get_ipython" in globals().keys():
            get_ipython = globals()["get_ipython"]
            shell = get_ipython().__class__.__name__
            if shell == "ZMQInteractiveShell":
                return True  # Jupyter notebook or qtconsole
        # elif shell == "TerminalInteractiveShell":
        #   return False  # Terminal running IPython
        #   else:
        return False  # Other type (?)
    except NameError:
        return False  # Probably standard Python interpreter


if is_notebook():
    from multiprocessing.pool import ThreadPool as Pool
    from threading import Lock
else:
    from multiprocessing.pool import Pool
    from multiprocessing import Lock


#########################################
# Minimal program example
#########################################
import os
import random

from typing import Any, Iterator

def generate_values_for_parallel(max: int) -> Iterator[int]:
    for _ in range(0, max):
        yield random.random()


def parallel_unit(arg: Any) -> list[int]:
    return "Received --> " + str(arg)


if __name__ == '__main__':
    result = []
    pool = Pool(processes=4)
    for loop_result in pool.imap_unordered(parallel_unit, generate_values_for_parallel(10), 2*os.cpu_count()):
        result.append(loop_result)
    pool.close()
    pool.join()
    print("\n".join(result))

`Pool` and `ThreadPool` are completely different things and present different performances . See https://stackoverflow.com/questions/70700809/multiprocessing-pool-vs-multiprocessing-pool-threadpool — J. Choi, Apr 05 '23 at 04:43
J.Choi thanks for your clarification. The purposes of Pool and ThreadPool are different, obviously, but my solution is just to make the code portable and run the same code regardless the execution environment (standard interpreter or jupyter notebook). — Armando Contestabile, Apr 06 '23 at 06:34

sebtac · Answer 5 · 2021-02-09T00:25:11.330

0

This works for me on MAC (cannot make it work on windows):

    import multiprocessing as mp
    mp_start_count = 0

    if __name__ == '__main__':
        if mp_start_count == 0:
            mp.set_start_method('fork')
            mp_start_count += 1

edited Feb 09 '21 at 00:25

answered Feb 08 '21 at 21:38

sebtac

538
5
8

Multiprocessing on Python 3 Jupyter

5 Answers5

Linked

Related