1

I'm trying to run a function with multiprocessing. This is the code:

import multiprocessing as mu

output = []
def f(x):
    output.append(x*x)


jobs = []
np = mu.cpu_count()

for n in range(np*500):
    p = mu.Process(target=f, args=(n,))
    jobs.append(p)


running = []

for i in range(np):
    p = jobs.pop()
    running.append(p)
    p.start()

while jobs != []:
    for r in running:
        if r.exitcode == 0:
            try:
                running.remove(r)
                p = jobs.pop()
                p.start()
                running.append(p)
            except IndexError:
                break

print "Done:"
print output

The output is [], while it should be [1,4,9,...]. Someone sees where i'm making a mistake?

Jaime Perez
  • 109
  • 2
  • 12

2 Answers2

1

You are using multiprocessing, not threading. So your output list is not shared between the processes.

There are several possible solutions;

  1. Retain most of your program but use a multiprocessing.Queue instead of a list. Let the workers put their results in the queue, and read it from the main program. It will copy data from process to process, so for big chunks of data this will have significant overhead.
  2. You could use shared memory in the form of multiprocessing.Array. This might be the best solution if the processed data is large.
  3. Use a Pool. This takes care of all the process management for you. Just like with a queue, it copies data from process to process. It is probably the easiest to use. IMO this is the best option if the data sent to/from each worker is small.
  4. Use threading so that the output list is shared between threads. Threading in CPython has the restriction that only one thread at a time can be executing Python bytecode, so you might not get as much performance benefit as you'd expect. And unlike the multiprocessing solutions it will not take advantage of multiple cores.
Roland Smith
  • 42,427
  • 3
  • 64
  • 94
0

Edit: Thanks to @Roland Smith to point out. The main problem is the function f(x). When child process call this, it's unable for them to fine the output variable (since it's not shared).

Edit: Just as @cdarke said, in multiprocessing you have to carefully control the shared object that child process could access(maybe a lock), and it's pretty complicated and hard to debug.

Personally I suggest to use the Pool.map method for this.

For instance, I assume that you run this code directly, not as a module, then your code would be:

import multiprocessing as mu

def f(x):
    return x*x

if __name__ == '__main__':   
    np = mu.cpu_count()
    args = [n for n in range(np*500)]

    pool = mu.Pool(processes=np)
    result = pool.map(f, args)
    pool.close()
    pool.join()
    print result

but there's something you must know

  1. if you just run this file but not import with module, the if __name__ == '__main__': is important, since python will load this file as a module for other process, if you don't place the function 'f' outside if __name__ == '__main__':, the child process would not be able to find your function 'f' **Edit:**thanks @Roland Smith point out that we could use tuple
  2. if you have more then one args for the function f, then you might need a tuple to do so, for instance

    def f((x,y))
       return x*y
    
    args = [(n,1) for n in range(np*500)]
    result = pool.map(f, args)
    

    or check here for more detailed discussion

Community
  • 1
  • 1
Jkm
  • 187
  • 1
  • 2
  • 6
  • Regarding 2, you can put all the arguments in a `tuple`. – Roland Smith Jun 28 '15 at 09:42
  • Yes, you are right, I was confused with other method that required lamdba function, I will edit this. – Jkm Jun 28 '15 at 09:47
  • 1
    And using multiprocessing the `output` list in the code in the question is *not* shared. – Roland Smith Jun 28 '15 at 09:48
  • True, the main problem is the `def f(x): output.append(x*x)`, when child process call this, it's unable for them to fine the `output` variable (since it's not shared), I will also add this, thanks to point out. – Jkm Jun 28 '15 at 10:01