3

I run the following solution from How can I recover the return value of a function passed to multiprocessing.Process?:

import multiprocessing
from os import getpid

def worker(procnum):
    print('I am number %d in process %d' % (procnum, getpid()))
    return getpid()

if __name__ == '__main__':
    pool = multiprocessing.Pool(processes = 3)
    print(pool.map(worker, range(5)))

which is supposed to output something like:

I am number 0 in process 19139
I am number 1 in process 19138
I am number 2 in process 19140
I am number 3 in process 19139
I am number 4 in process 19140
[19139, 19138, 19140, 19139, 19140]

but instead I only get

[4212, 4212, 4212, 4212, 4212]

If I feed pool.map a range of 1,000,000 using more than 10 processes I see at most two different pids.

Why is my copy of multiprocessing seemingly running everything in the same process?

Community
  • 1
  • 1
zelusp
  • 3,500
  • 3
  • 31
  • 65
  • 1
    What example did you pull this from? It may genuinely be the case that the code to execute is *so* fast that there's no benefit to forcing it to split up the work across multiple threads. – Makoto Nov 04 '16 at 23:14
  • On my machine I fairly consistently get all pid's in your example and in the case of 100 tasks on 10 processes. So this seems to be somewhat context-dependent. – Mark Nov 05 '16 at 12:01
  • 1
    Note that [the `map` method](https://docs.python.org/3.5/library/multiprocessing.html#multiprocessing.pool.Pool.map) has also a `chunksize` parameter that can be used to fix the size of the chunks sent to child processes. So you may be able to change the behaviour a bit by setting `chunksize` to a smaller value. In any case: `Pool` is meant to be an easy way to distribute the tasks where you don't care exactly who execute them etc. It's default settings should work *well enough* for almost all situations, so don't worry about this until you find a real issue with performances... – Bakuriu Nov 05 '16 at 12:30
  • @Makoto, the example is linked in the question. It's actually Mark 's example – zelusp Nov 05 '16 at 20:46
  • On Windows, using multiple processes is rather inefficient. It is possible that Python is taking that into account. – Harry Johnston Nov 08 '16 at 03:21

1 Answers1

2

TL;DR: tasks are not specifically distributed in any way, perhaps your tasks are so short they are all completed before the other processes get started.

From looking at the source of multiprocessing, it appears that tasks are simply put in a Queue, which the worker processes read from (function worker reads from Pool._inqueue). There's no calculated distribution going on, the workers just race to work as hard as possible.

The most likely bet then, would be that as the tasks are simply very short, so one process finishes all of them before the others have a chance to look or even get started. You can easily check if this is the case this by adding a two-second sleep to the task.

I'll note that on my machine, the tasks all get spread over the processes pretty homogeneously (also for #processes > #cores). So there seems to be some system-dependence, even though all processes should have .start()ed before work is queued.


Here's some trimmed source from worker, which shows that the tasks are just read from the queue by each process, so in pseudo-random order:

def worker(inqueue, outqueue, ...):
    ...
    get = inqueue.get
    ...
    while maxtasks is None or (maxtasks and completed < maxtasks):
        try:
            task = get()
        ...

SimpleQueue communicates between processes using a Pipe, from the SimpleQueue constructor:

self._reader, self._writer = Pipe(duplex=False)

EDIT: possibly the part about processes starting too slow is false, so I removed it. All processes are .start()ed before any work is queued (which may be platform-dependent). I can't find whether the process is ready at the moment .start() returns.

Community
  • 1
  • 1
Mark
  • 18,730
  • 7
  • 107
  • 130
  • Turns out my machine requires a delay of about `time.sleep(0.0005)` to get different processes. – zelusp Nov 05 '16 at 20:54
  • My guess would be that there's a 0.0005s delay between `Process` returning and the process being available on your system, but I can't find it documented online and I can't test it on my system since there's no noticeable delay... – Mark Nov 06 '16 at 09:29