151

I am trying to implement multiprocessing in my code, and so, I thought that I would start my learning with some examples. I used the first example found in this documentation.

from multiprocessing import Pool
def f(x):
    return x*x

if __name__ == '__main__':
    with Pool(5) as p:
        print(p.map(f, [1, 2, 3]))

When I run the above code I get an AttributeError: can't get attribute 'f' on <module '__main__' (built-in)>. I do not know why I am getting this error. I am also using Python 3.5 if that helps.

PiccolMan
  • 4,854
  • 12
  • 35
  • 53

5 Answers5

149

This problem seems to be a design feature of multiprocessing.Pool. See https://bugs.python.org/issue25053. For some reason Pool does not always work with objects not defined in an imported module. So you have to write your function into a different file and import the module.

File: defs.py

def f(x):
    return x*x

File: run.py

from multiprocessing import Pool
import defs

 if __name__ == '__main__':
    with Pool(5) as p:
        print(p.map(defs.f, [1, 2, 3]))

If you use print or a different built-in function, the example should work. If this is not a bug (according to the link), the given example is chosen badly.

hr87
  • 1,775
  • 1
  • 11
  • 17
  • 2
    When I run the scripts, I got: AttributeError: __exit__. Turned out the problem was with the "with" statement, which requires an object with "_ _ enter __" and "__ exit __" method. So I had to change it to: p = Pool(5) and it worked. Thank you very much! – Dang Manh Truong Nov 04 '17 at 11:16
  • 43
    You have to define the f() function before you create the instance of Pool, otherwise the workers cannot see your function. However, as per my understanding, you do NOT forcefully have to use imports. – Charly Empereur-mot Aug 19 '19 at 01:03
  • 1
    Very funny, when running code under pycharm it gives me this error, but not in vscode or jupyter. So – Jay Sep 26 '19 at 13:45
  • If you do not wanna run into the hassle of writing another program here's an easier workaround. Make sure you always define a the worker function before – asiffarhankhan Mar 06 '20 at 12:41
  • 28
    For me, just defining the function f() above Pool creation (or import) did not solve the issue (Win 10, Python 3.6.8) – FlorianH Mar 26 '20 at 09:49
  • 3
    Me neither - Pool() still does not work with these suggestions. ThreadPool() works as intended, but it is a totally different function. – Vaidøtas I. Apr 10 '20 at 19:56
  • What worked for me was: 1) putting my function into a separate .py script as the above. I also included in the .py script the necessary import modules for which my function depended upon. 2) I imported my function before ```from multiprocessing import Pool``` 3) I also did the other suggestion above by removing the ```with``` statement for pool. I am using Windows 10, Python 3.9.4. – DataNoob7 Jul 14 '21 at 17:56
  • 1
    What worked for me was putting the function f outside main (but it was still inside the same python script) which is funny because it didn't work for the OP. Inside main gave me the same error. Python 3.7: I also ran it in Pycharm using the run command – brian_ds Aug 24 '21 at 20:30
  • This is absolutely the wrong answer for the general case. The script itself is already an importable module, so it should just work. The problem would only occur in cases where you're either using an interactive interpreter or the real main script is wrapping your own script (e.g. running under `cProfile` to profile your code, under `unittest` to invoke unittests, wrapped in an IDE's interpreter, etc.). [The linked issue](https://bugs.python.org/issue25053) is specifically caused by that sort of case; it's fixable this way, but legal script run w/plain `python3` don't need the fix. – ShadowRanger Apr 06 '22 at 19:52
122

The multiprocessing module has a major limitation when it comes to IPython use:

Functionality within this package requires that the __main__ module be importable by the children. [...] This means that some examples, such as the multiprocessing.pool.Pool examples will not work in the interactive interpreter. [from the documentation]

Fortunately, there is a fork of the multiprocessing module called multiprocess which uses dill instead of pickle to serialization and overcomes this issue conveniently.

Just install multiprocess and replace multiprocessing with multiprocess in your imports:

import multiprocess as mp

def f(x):
    return x*x

with mp.Pool(5) as pool:
    print(pool.map(f, [1, 2, 3, 4, 5]))

Of course, externalizing the code as suggested in this answer works as well, but I find it very inconvenient: That is not why (and how) I use IPython environments.

<tl;dr> multiprocessing does not work in IPython environments right away, use its fork multiprocess instead.

Michael Dorner
  • 17,587
  • 13
  • 87
  • 117
12

This answer is for those who get this error on Windows 10 in 2021.

I've researched this error a bit since I got it myself. I get this error when running any examples from the official Python 3 documentation on multiprocessing.

Test environment:

  • x86 Windows 10.0.19043.1165 + Python 3.9.2 - there is an error
  • x86 Windows 10.0.19043.1165 + Python 3.9.6 - there is an error
  • x86 Windows 10.0.19043.1110 + Python 3.9.6 - there is an error
  • ARM Windows 10.0.21354.1 + Python 3.9.6 - no error (version from DEV branch)
  • ARM macOS 11.5.2 + Python 3.9.6 - no errors

I have no way to test this situation in other conditions. But my guess is that the problem is on Windows as there is no such bug in the developer version "10.0.21354.1", but this ARM version probably has x86 emulation.

Also note that there was no such bug at the time Python 3.9.2 was released (February). Since all this time I was working on the same computer, I was surprised by the situation when the previously working code stopped working, and only the version for Windows changed.

I was unable to find a bug request with a similar situation in the Python bug tracker (I probably did a poor search). And the message marked "Correct answer" refers to a different situation. The problem is easy to reproduce, you can try to follow any example from the multiprocessing documentation on a freshly installed Windows 10 + Python 3.

Later, I will have the opportunity to check out Python 3.10 and the latest version of Windows 10. I am also interested in this situation in the context of Windows 11.

If you have information about this error (link to the bug tracker or something similar), be sure to share it.

At the moment I switched to Linux to continue working.

petezurich
  • 9,280
  • 9
  • 43
  • 57
AtachiShadow
  • 381
  • 4
  • 13
  • 1
    I can reproduce the issue on Windows 1902 x64. No issues with WSL. `p = Process(target = f('bob'))` doesn't "work" since `f('bob')` is computed in current process and instead `Process`'s `target` gets the `f`'s return value i.e. `None`. If you change `f` to `def f(name): return name` then it will throw error since string `name` is not `callable`. – tejasvi88 Aug 26 '21 at 08:03
  • Thanks a lot for the clarification. Edited the answer, removed all the code - it still doesn't help) – AtachiShadow Oct 14 '21 at 22:09
3

Why not use joblib? Your code is equivalent to:

# pip install joblib

from joblib import Parallel, delayed


def f(x):
    return x*x

res = Parallel(
    n_jobs=5
)(
    delayed(f)(x) for x in [1, 2, 3]
)
Wenmin Wu
  • 1,808
  • 12
  • 24
  • 2
    This has nothing to do with the question. The OP probably wants to use `multiprocessing` since it's a native solution. – Neoares Aug 16 '22 at 08:53
  • works like a charm in IPython! – ClementWalter Sep 15 '22 at 09:53
  • 2
    @Neoares Why stick to the native solution when there's a better workable choice? – Wenmin Wu Dec 16 '22 at 03:25
  • @Wenmin Wu Is joblib a fully functional replacement of torch.multiprocessing e.g. support handaling multiple GPU's and DataSampler the same way? – MosQuan Mar 05 '23 at 23:26
  • Thanks, very useful, I didn't know about `joblib` and had a usecase where `multiprocessing` was difficult to use. `joblib` did the job. – ingo-m Jun 19 '23 at 15:36
-2

If you're using Jupyter notebook (like the OP), then defining the function in a separate cell and executing that cell first fixes the problem. The accepted answer works too, but it's more work. Defining the function before, i.e. above the pool, isn't adequate. It has to be in a completely different notebook cell which is executed first.

ASDFQWERTY
  • 399
  • 4
  • 8