28

I come here because I have an issue with my Jupiter's Python3 notebook. I need to create a function that uses the multiprocessing library. Before to implement it, I make some tests. I found a looooot of different examples but the issue is everytime the same : my code is executed but nothing happens in the notebook's interface :

enter image description here

The code i try to run on jupyter is this one :

import os

from multiprocessing import Process, current_process


def doubler(number):
    """
    A doubling function that can be used by a process
    """
    result = number * 2
    proc_name = current_process().name
    print('{0} doubled to {1} by: {2}'.format(
        number, result, proc_name))
    return result


if __name__ == '__main__':
    numbers = [5, 10, 15, 20, 25]
    procs = []
    proc = Process(target=doubler, args=(5,))

    for index, number in enumerate(numbers):
        proc = Process(target=doubler, args=(number,))
        proc2 = Process(target=doubler, args=(number,))
        procs.append(proc)
        procs.append(proc2)
        proc.start()
        proc2.start()

    proc = Process(target=doubler, name='Test', args=(2,))
    proc.start()
    procs.append(proc)

    for proc in procs:
        proc.join()

It's OK when I just run my code without Jupyter but with the command "python my_progrem.py" and I can see the logs : enter image description here

Is there, for my example, and in Jupyter, a way to catch the results of my two tasks (proc1 and proc2 which both call thefunction "doubler") in a variable/object that I could use after ? If "yes", how can I do it?

alex
  • 10,900
  • 15
  • 70
  • 100
Konate Malick
  • 367
  • 1
  • 3
  • 7
  • Here is what helped: https://medium.com/@grvsinghal/speed-up-your-python-code-using-multiprocessing-on-windows-and-jupyter-or-ipython-2714b49d6fac Define your main function in separate python module. – user1700890 Jan 31 '23 at 20:33

5 Answers5

20

@Konate's answer really helped me. Here is a simplified version using multiprocessing.pool:

import multiprocessing

def double(a):
    return a * 2

def driver_func():
    PROCESSES = 4
    with multiprocessing.Pool(PROCESSES) as pool:
        params = [(1, ), (2, ), (3, ), (4, )]
        results = [pool.apply_async(double, p) for p in params]

        for r in results:
            print('\t', r.get())
driver_func()

enter image description here

Anton Frolov
  • 112
  • 1
  • 9
Kamen Tsvetkov
  • 519
  • 5
  • 9
  • 15
    @Kamen Tsvetkov, Thanks for sharing your approach. I tried it on my windows machine, it seems that driver_func() just hangs out there without outputting anything – user785099 Jun 04 '21 at 19:15
  • Thanks for sharing your solution. How did the runtime compare to the non-parallelized version? – kushy Aug 06 '21 at 12:37
  • 3
    didn't work on mac + JupyterLab – Anton Frolov May 19 '22 at 08:03
  • you release that this runs "asynchronously in a single process" not multiple, right? Check the [AsyncResult object docs](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.AsyncResult) where they say this explicitly in the example – CpILL Sep 27 '22 at 01:35
  • As previously said by user785099, this solution does not work on Windows: the code cell just hangs when executing. The solution for windows is here, simple and effective: https://jupyter-tutorial.readthedocs.io/en/stable/performance/multiprocessing.html – Armando Contestabile Dec 21 '22 at 07:26
  • @ArmandoContestabile I tried your link on Windows and it did not work. It hangs – user1700890 Jan 31 '23 at 20:24
  • @user1700890 I copied right now the content of cell #1 of the linked page 2 posts before and it works. My environment is: Windows 10, Python 3.11.1 on virtual environment, VSCode + Jupyter extension. – Armando Contestabile Feb 02 '23 at 14:34
  • @ArmandoContestabile The reason the code in the link works is that it uses a ThreadPool, rather than a Pool – stuart10 Mar 22 '23 at 11:11
7

I succeed by using multiprocessing.pool. I was inspired by this approach :

def test():
    PROCESSES = 4
    print('Creating pool with %d processes\n' % PROCESSES)

with multiprocessing.Pool(PROCESSES) as pool:
    TASKS = [(mul, (i, 7)) for i in range(10)] + \
            [(plus, (i, 8)) for i in range(10)]

    results = [pool.apply_async(calculate, t) for t in TASKS]
    imap_it = pool.imap(calculatestar, TASKS)
    imap_unordered_it = pool.imap_unordered(calculatestar, TASKS)

    print('Ordered results using pool.apply_async():')
    for r in results:
        print('\t', r.get())
    print()

    print('Ordered results using pool.imap():')
    for x in imap_it:
        print('\t', x)

...etc For more, the code is at : https://docs.python.org/3.4/library/multiprocessing.html?

macrocosme
  • 473
  • 7
  • 24
Konate Malick
  • 367
  • 1
  • 3
  • 7
6

Another way of running multiprocessing jobs in a Jupyter notebook is to use one of the approaches supported by the nbmultitask package.

psychemedia
  • 5,690
  • 7
  • 52
  • 84
1

It would be good to clarify some things before to give the answer:

  • officially, as per the documentation, multiprocessing.Pool does not work on interactive interpreter (such as Jupyter notebooks). See also this answer.
  • unlike multiprocessing.Pool, multiprocessing.ThreadPool does work also in Jupyter notebooks

To make a generic Pool class working on both classic and interactive python interpreters I have made this:

def is_notebook() -> bool:
    try:
        if "get_ipython" in globals().keys():
            get_ipython = globals()["get_ipython"]
            shell = get_ipython().__class__.__name__
            if shell == "ZMQInteractiveShell":
                return True  # Jupyter notebook or qtconsole
        # elif shell == "TerminalInteractiveShell":
        #   return False  # Terminal running IPython
        #   else:
        return False  # Other type (?)
    except NameError:
        return False  # Probably standard Python interpreter


if is_notebook():
    from multiprocessing.pool import ThreadPool as Pool
    from threading import Lock
else:
    from multiprocessing.pool import Pool
    from multiprocessing import Lock

The following example works on both standard .py and jupyter .ipynb files.

#########################################
# Diversified import based on execution environment (notebook/standard interpreter)
#########################################
def is_notebook() -> bool:
    try:
        if "get_ipython" in globals().keys():
            get_ipython = globals()["get_ipython"]
            shell = get_ipython().__class__.__name__
            if shell == "ZMQInteractiveShell":
                return True  # Jupyter notebook or qtconsole
        # elif shell == "TerminalInteractiveShell":
        #   return False  # Terminal running IPython
        #   else:
        return False  # Other type (?)
    except NameError:
        return False  # Probably standard Python interpreter


if is_notebook():
    from multiprocessing.pool import ThreadPool as Pool
    from threading import Lock
else:
    from multiprocessing.pool import Pool
    from multiprocessing import Lock


#########################################
# Minimal program example
#########################################
import os
import random

from typing import Any, Iterator

def generate_values_for_parallel(max: int) -> Iterator[int]:
    for _ in range(0, max):
        yield random.random()


def parallel_unit(arg: Any) -> list[int]:
    return "Received --> " + str(arg)


if __name__ == '__main__':
    result = []
    pool = Pool(processes=4)
    for loop_result in pool.imap_unordered(parallel_unit, generate_values_for_parallel(10), 2*os.cpu_count()):
        result.append(loop_result)
    pool.close()
    pool.join()
    print("\n".join(result))
  • `Pool` and `ThreadPool` are completely different things and present different performances . See https://stackoverflow.com/questions/70700809/multiprocessing-pool-vs-multiprocessing-pool-threadpool – J. Choi Apr 05 '23 at 04:43
  • J.Choi thanks for your clarification. The purposes of Pool and ThreadPool are different, obviously, but my solution is just to make the code portable and run the same code regardless the execution environment (standard interpreter or jupyter notebook). – Armando Contestabile Apr 06 '23 at 06:34
0

This works for me on MAC (cannot make it work on windows):

    import multiprocessing as mp
    mp_start_count = 0

    if __name__ == '__main__':
        if mp_start_count == 0:
            mp.set_start_method('fork')
            mp_start_count += 1
sebtac
  • 538
  • 5
  • 8