18

I am new to the multiprocessing module in Python and work with Jupyter notebooks. I have tried the following code snippet from PMOTW:

import multiprocessing

def worker():
    """worker function"""
    print('Worker')
    return

if __name__ == '__main__':
    jobs = []
    for i in range(5):
        p = multiprocessing.Process(target=worker)
        jobs.append(p)
        p.start()

When I run this as is, there is no output.

I have also tried creating a module called worker.py and then importing that to run the code:

import multiprocessing
from worker import worker

if __name__ == '__main__':
    jobs = []
    for i in range(5):
        p = multiprocessing.Process(target=worker)
        jobs.append(p)
        p.start()

There is still no output in that case. In the console, I see the following error (repeated multiple times):

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Program Files\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_main
    exitcode = _main(fd)
  File "C:\Program Files\Anaconda3\lib\multiprocessing\spawn.py", line 116, in _main
    self = pickle.load(from_parent)
AttributeError: Can't get attribute 'worker' on <module '__main__' (built-in)>

However, I get the expected output when the code is saved as a Python script and exectued.

What can I do to run this code directly from the notebook without creating a separate script?

curiouscientist
  • 181
  • 1
  • 1
  • 5
  • I can run your code using `Python 3.6.3` there seem to be nothing wrong with your code. When you put all your code in a script and run the example code, you should be able to see the output. – relay Feb 17 '18 at 23:37
  • Yes, I was able to get an output as well. However, it only worked when I saved the entire code as a script and then ran the script. How can an output be obtained within the notebook? – curiouscientist Feb 17 '18 at 23:40
  • I can also get the output using the notebook. – relay Feb 17 '18 at 23:41
  • Can you please share how you did that? I am using Jupyter and got nothing at the output. I ran the snippet of code at the very top of my question. I get: AttributeError: Can't get attribute 'worker' on – curiouscientist Feb 17 '18 at 23:44
  • I copied and pasted your code into my notebook cell and pushed `ctrl+enter`. Then i saw the output. – relay Feb 17 '18 at 23:46
  • Do you really want to start multiple processes on your Jupyter server? They may be hard to kill. – dstromberg Feb 18 '18 at 00:59
  • Looks like I don't understand this properly at all. I can create a `script.py` file with all of the code and run in Jupyter using `%run script.py`. This gives the output on the console (not the notebook) the first time it is run. If I re-run the cell `%run script.py` a second time, it throws an error inside the notebook!!? – curiouscientist Feb 18 '18 at 01:41
  • I see a related question [here](https://stackoverflow.com/questions/29629103/simple-python-multiprocessing-function-doesnt-output-results) – curiouscientist Feb 18 '18 at 05:33
  • If you would like to do parallel computing using Jupyter notebook, you might want to [take a look at this](https://ipyparallel.readthedocs.io/en/latest/). – relay Feb 18 '18 at 10:29

5 Answers5

27

I'm relatively new to parallel computing so I may be wrong with some technicalities. My understanding is this:

Jupyter notebooks don't work with multiprocessing because the module pickles (serialises) data to send to processes. multiprocess is a fork of multiprocessing that uses dill instead of pickle to serialise data which allows it to work from within Jupyter notebooks. The API is identical so the only thing you need to do is to change

import multiprocessing

to...

import multiprocess

You can install multiprocess very easily with a simple

pip install multiprocess

You will however find that your processes will still not print to the output, (although in Jupyter labs they will print out to the terminal the server out is running in). I stumbled upon this post trying to work around this and will edit this post when I find out how to.

Eden Trainor
  • 571
  • 5
  • 17
  • 1
    Doesn't work for me (Python 3.10), I get `+[__NSCFConstantString initialize] may have been in progress in another thread when fork() was called.` – gotofritz Jun 23 '23 at 23:51
5

I'm not an export either in multiprocessing or in ipykernel(which is used by jupyter notebook) but because there seems nobody gives an answer, I will tell you what I guessed. I hope somebody complements this later on.

I guess your jupyter notebook server is running on Windows host. In multiprocessing there are three different start methods. Let's focus on spawn, which is the default on windows, and fork, the default on Unix.

Here is a quick overview.

  • spawn

    • (cpython) interactive shell - always raise an error
    • run as a script - okay only if you nested multiprocessing code in if __name__ == '__main'__
  • Fork

    • always okay

For example,

import multiprocessing

def worker():
    """worker function"""
    print('Worker')
    return

if __name__ == '__main__':
    multiprocessing.set_start_method('spawn')
    jobs = []
    for i in range(5):
        p = multiprocessing.Process(target=worker)
        jobs.append(p)
        p.start()

This code works when it's saved and run as a script, but raises an error when entered in an python interactive shell. Here is the implementation of ipython kernel, and my guess is that that it uses some kind of interactive shell and so doesn't go well with spawn(but please don't trust me).


For a side note, I will give you an general idea of how spawn and fork are different. Each subprocess is running a different python interpreter in multiprocessing. Particularly, with spawn, a child process starts a new interpreter and imports necessary module from scratch. It's hard to import code in interactive shell, so it may raise an error.

fork is different. With fork, a child process copies the main process including most of the running states of the python interpreter and then continues execution. This code will help you understand the concept.

import os


main_pid = os.getpid()

os.fork()
print("Hello world(%d)" % os.getpid())  # print twice. Hello world(id1) Hello world(id2)

if os.getpid() == main_pid:
    print("Hello world(main process)")  # print once. Hello world(main process)
  • I tried this, first thing this is only for cygwin/unix => https://stackoverflow.com/a/19547482/1225413 ,second this is no longer valid https://stackoverflow.com/a/52476456/1225413 .. When I tried this code, I got attribute error, os has no attribute fork – Akhil Jain Jan 27 '21 at 02:08
3

Much like you I encountered the attribute error. The problem seems to be related how jupyter handles multithreading. The fastest result I got was to follow the Multi-processing example.

So the ThreadPool took care of my issue.

from multiprocessing.pool import ThreadPool as Pool

def worker():
    """worker function"""
    print('Worker\n')
    return


pool = Pool(4)
for result in pool.map(worker, range(5)):
    pass    # or print diagnostics
Sam
  • 91
  • 5
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Sep 11 '22 at 13:10
0

This works for me on MAC (cannot make it work on windows):

import multiprocessing as mp
mp_start_count = 0

if __name__ == '__main__':
    if mp_start_count == 0:
        mp.set_start_method('fork')
        mp_start_count += 1
sebtac
  • 538
  • 5
  • 8
0

Save the function to a separate Python file then import the function back in. It should work fine that way.

Berg
  • 36
  • 3