0

I'm created a dedicated environment for my new project using anacoda on Windows 10. I write and run my code from Jupyter Notebook where I want to use multiprocessing but after I run even the most straightforward code from the module's documentation it gets stuck. Here's the code:

from multiprocessing import Pool

def f(x):
    return x*x

if __name__ == '__main__':
    with Pool(5) as p:
        print(p.map(f, [1, 2, 3]))

Code below also doesn't work:

p = Pool(5)
results = p.map(f, [1, 2, 3])
print(results)

Changing the environment to the base doesn't help. However, while running the code from PyCharm it works perfectly fine. Also it runs fine in Jupyter on Linux. I assume then it must be something Windows-Jupyter-related.

Versions of libraries I use:
python = 3.10.4
jupyter = 1.0.0
CPU: Intel i5 (but it shouldn't matter I think).

I've found a topic related to the same topic: Python multiprocessing on Windows 10
It's mentioned there, that it was an issue when using multiprocessing through venv but it is solved since Python 3.7.3.

Any ideas on how to solve the issue? Any workarounds?

Roberto
  • 649
  • 1
  • 8
  • 22
  • 1
    Is this code in a file or are you running it in the read-eval-print loop? The latter won't work. – Frank Yellin Jul 17 '22 at 16:59
  • @FrankYellin This code is as shown here, placed directly in a cell in Jupyter Notebook. – Roberto Jul 17 '22 at 17:37
  • @Roberto in jupyter (or any other interactive session) the target function must be in a separate library so it can be imported. There is no "main" file to import in an interactive session. There are a bunch of 3rd party libraries that seek to sidestep this issue (`multiprocess` for example), but the fact of the matter is that using "spawn" to create a new process (rather than fork: look it up) is not broadly compatible with interactive interpreters. It is intended that you save a .py file and execute it from a terminal – Aaron Jul 17 '22 at 20:04
  • Listen to @aaron and me. multiprocessing only works in specific environments. The newly created methods need to have a file to read that has the code it is to execute. – Frank Yellin Jul 18 '22 at 00:09
  • @Aaron I've just tested the same code on Linux (edits in the question) and it works fine there - on Jupyter. I also unwrapped the code (the second example). If this is an issue only on Windows maybe it's worth raising an issue on GitHub? – Roberto Jul 18 '22 at 05:53
  • This is just how multiprocessing works. It's not an issue to be raise on github. Jupyter is an interactive environment, and multiprocessing is just not compatible with interactive sessions when "spawn" is the startmethod. it's even in the official [documentation](https://docs.python.org/3/library/multiprocessing.html#using-a-pool-of-workers): "Note Functionality within this package requires that the `__main__` module be importable by the children." It works on linux because the default startmethod is "fork". – Aaron Jul 18 '22 at 17:04
  • your second attempt at getting rid of `if __name__ == "__main__":` is going the wrong direction. Keep that, and define `f` in an external .py file, import it, then pass it to `pool.map`. The target function must be imported from a .py file; .ipynb (notebook) is not suitable. – Aaron Jul 18 '22 at 17:07
  • @Aaron Thank you for the explanation. I've never paid attention to what startmethod is used. Now, I see a difference. – Roberto Jul 18 '22 at 19:47
  • @Roberto Fork mostly exists for legacy reasons... It's technically faster, and technically has the ability to save memory usage (both have their caveats in reality). The downside is that fork does not play nice with threads, and more things than you might assume are multi-threaded under the hood. Spawn offers a clean environment for the child process to start up, but that means everything must be imported because nothing was copied from the main process. – Aaron Jul 18 '22 at 20:02

0 Answers0