python multiprocessing can't find error

Question

I'm trying to run a function with multiprocessing. This is the code:

import multiprocessing as mu

output = []
def f(x):
    output.append(x*x)


jobs = []
np = mu.cpu_count()

for n in range(np*500):
    p = mu.Process(target=f, args=(n,))
    jobs.append(p)


running = []

for i in range(np):
    p = jobs.pop()
    running.append(p)
    p.start()

while jobs != []:
    for r in running:
        if r.exitcode == 0:
            try:
                running.remove(r)
                p = jobs.pop()
                p.start()
                running.append(p)
            except IndexError:
                break

print "Done:"
print output

The output is [], while it should be [1,4,9,...]. Someone sees where i'm making a mistake?

Cursory glance at the code, where are you waiting for the jobs to complete? `jobs.join()` somewhere? There also seems to be a lack of locks, but I haven't figured out your logic yet. — cdarke, Jun 28 '15 at 08:19
same output :S . If I should join() instead, some process will finish but parent won't notice until process join() is called, doesn't it? — Jaime Perez, Jun 28 '15 at 08:27
BTW if I put: while jobs != []: for r in running: r.join() if r.exitcode == 0:... get the same error — Jaime Perez, Jun 28 '15 at 08:36
Oh! it was actually the shared list! I change the append with a print and everything works fine. Thanks!! — Jaime Perez, Jun 28 '15 at 08:38
See http://stackoverflow.com/questions/25411673/python-multiprocessing-appending-list — cdarke, Jun 28 '15 at 08:43

Roland Smith · Answer 1 · 2015-06-28T09:46:42.300

You are using multiprocessing, not threading. So your output list is not shared between the processes.

There are several possible solutions;

Retain most of your program but use a multiprocessing.Queue instead of a list. Let the workers put their results in the queue, and read it from the main program. It will copy data from process to process, so for big chunks of data this will have significant overhead.
You could use shared memory in the form of multiprocessing.Array. This might be the best solution if the processed data is large.
Use a Pool. This takes care of all the process management for you. Just like with a queue, it copies data from process to process. It is probably the easiest to use. IMO this is the best option if the data sent to/from each worker is small.
Use threading so that the output list is shared between threads. Threading in CPython has the restriction that only one thread at a time can be executing Python bytecode, so you might not get as much performance benefit as you'd expect. And unlike the multiprocessing solutions it will not take advantage of multiple cores.

score 0 · Accepted Answer · edited May 23 '17 at 11:58

Edit: Thanks to @Roland Smith to point out. The main problem is the function f(x). When child process call this, it's unable for them to fine the output variable (since it's not shared).

Edit: Just as @cdarke said, in multiprocessing you have to carefully control the shared object that child process could access(maybe a lock), and it's pretty complicated and hard to debug.

Personally I suggest to use the Pool.map method for this.

For instance, I assume that you run this code directly, not as a module, then your code would be:

import multiprocessing as mu

def f(x):
    return x*x

if __name__ == '__main__':   
    np = mu.cpu_count()
    args = [n for n in range(np*500)]

    pool = mu.Pool(processes=np)
    result = pool.map(f, args)
    pool.close()
    pool.join()
    print result

but there's something you must know

if you just run this file but not import with module, the if __name__ == '__main__': is important, since python will load this file as a module for other process, if you don't place the function 'f' outside if __name__ == '__main__':, the child process would not be able to find your function 'f' **Edit:**thanks @Roland Smith point out that we could use tuple
if you have more then one args for the function f, then you might need a tuple to do so, for instance
```
def f((x,y))
   return x*y

args = [(n,1) for n in range(np*500)]
result = pool.map(f, args)
```
or check here for more detailed discussion

Yes, you are right, I was confused with other method that required lamdba function, I will edit this. — Jkm, Jun 28 '15 at 09:47
And using multiprocessing the `output` list in the code in the question is *not* shared. — Roland Smith, Jun 28 '15 at 09:48
True, the main problem is the `def f(x): output.append(x*x)`, when child process call this, it's unable for them to fine the `output` variable (since it's not shared), I will also add this, thanks to point out. — Jkm, Jun 28 '15 at 10:01

python multiprocessing can't find error

2 Answers2