20

I am trying to use multiprocessing for the first time. So I thought I would make a very simple test example which factors 100 different numbers.

from multiprocessing import Pool
from primefac import factorint
N = 10**30
L = range(N,N + 100)
pool = Pool()
pool.map(factorint, L)

This gives me the error:

Traceback (most recent call last):
  File "test.py", line 8, in <module>
    pool.map(factorint, L)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 567, in get
    raise self._value
AssertionError: daemonic processes are not allowed to have children

I see that Python Process Pool non-daemonic? discusses this problem but I don't understand why it is relevant to my simple toy example. What am I doing wrong?

Simd
  • 19,447
  • 42
  • 136
  • 271
  • Are you on Windows? – abarnert Jul 23 '18 at 18:36
  • @abarnert Ubuntu 16.04.4 and 2.7.12 – Simd Jul 23 '18 at 18:40
  • OK, so that's not the issue. You _should_ do the `__main__` guard anyway, but failing to do so won't break anything on linux. – abarnert Jul 23 '18 at 18:44
  • Next question: what's that `primefac` module? Is it trying to use a `multiprocessing.Pool` internally? (Sorry; I think it's a common library on PyPI, so normally I'd just go check it myself, but [PyPI is down for maintenance at the moment…](https://status.python.org/)) – abarnert Jul 23 '18 at 18:45
  • @abarnert I just did `pip install primefac` . It is from https://pypi.org/project/primefac/ . I don't know how it works internally sadly. – Simd Jul 23 '18 at 18:46
  • Well, look at the source code. Or just `grep -ir multiprocessing `. – abarnert Jul 23 '18 at 18:46
  • Also, can you paste the whole traceback, instead of just the error description? (That would allow me or others to rule out most of the same issues without needing to look at the code or ask you to do it.) – abarnert Jul 23 '18 at 18:47
  • @abarnert I think primefac does use multiple cores itself. – Simd Jul 23 '18 at 18:53
  • OK, see my updated answer. I'll come back later and verify it when PyPI is working and I can find `primefac`'s official source repo, but I'm pretty sure that it does use multiple cores, and that's exactly your problem. – abarnert Jul 23 '18 at 19:01

2 Answers2

18

The problem appears to be that primefac uses its own multiprocessing.Pool. Unfortunately, while PyPI is down, I can't find the source to the module—but I did find various forks on GitHub, like this one, and they all have multiprocessing code.

So, your apparently simple example isn't all that simple—because it's importing and running non-simple code.

By default, all Pool processes are daemonic, so you can't create more child processes from inside another Pool. Usually, attempting to do so is a mistake.

If you really do want to multiprocess the factors even though some of them are going to multiprocess their own work (quite possibly adding more contention overhead without adding any parallelism), then you just have to subclass Pool and override that—as explained in the related question that you linked.

But the simplest thing is to just not use multiprocessing here, if primefac is already using your cores efficiently. (If you need quasi-concurrency, getting answers as they come in instead of getting them in sequence, I suppose you could do that with a thread pool, but I don't think there's any advantage to that here—you're not using imap_unordered or explicit AsyncResult anywhere.)

Alternatively, if it's not using all of your cores most of the time, only doing so for the "tricky remainders" at the end of factoring some numbers, while you've got 7 cores sitting idle for 60% of the time… then you probably want to prevent primefac from using multiprocessing at all. I don't know if the module has a public API for doing that. If so, of course, just use it. If not… well, you may have to subclass or monkeypatch some of its code, or, at worst, monkeypatching its import of multiprocessing, and that may not be worth doing.

The ideal solution would probably be to refactor primefac to push the "tricky remainder" jobs onto the same pool you're already using. But that's probably by far the most work, and not that much more benefit.


As a side note, this isn't your problem, but you should have a __main__ guard around your top-level code, like this:

from multiprocessing import Pool
from primefac import factorint

if __name__ == '__main__':
    N = 10**30
    L = range(N,N + 100)
    pool = Pool()
    pool.map(factorint, L)

Otherwise, when run with the spawn or forkserver startmethods—and notice that spawn is the only one available on Windows—each pool process is going to try to create another pool of children. So, if you run your code on Windows, you would get this same assertion—as a way for multiprocessing to protect you from accidentally forkbombing your system.

This is explained under safe importing of main module in the "programming guidelines" section of the multiprocessing docs.

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • Your code gives the same error for me in python 2.7.12. `Traceback (most recent call last): File "test.py", line 8, in pool.map(factorint, L) File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in map return self.map_async(func, iterable, chunksize).get() File "/usr/lib/python2.7/multiprocessing/pool.py", line 567, in get raise self._value AssertionError: daemonic processes are not allowed to have children` – Simd Jul 23 '18 at 18:41
  • The answer seems to be just to use ThreadPool as a drop in replacement! – Simd Jul 23 '18 at 19:03
  • @Anush As I said in the answer, you can do that, but I don't think you'll get any benefit, unless you want to get the results as they come in rather than in-order. I suppose it's possible that the threads spend just enough time in multiprocessing but not too much, so it would significantly improve concurrency, but it doesn't seem all that likely you'd get that lucky.' – abarnert Jul 23 '18 at 19:56
2

I came here because my unittest raises

AssertionError: daemonic processes are not allowed to have children

This is because I have used multiprocessing and I did not close and join the pool properly, after close and join everything is fine now.

shellbye
  • 4,620
  • 4
  • 32
  • 44