13

I'm trying to call a function on multiple processes. The obvious solution is python's multiprocessing module. The problem is that the function has side effects. It creates a temporary file and registers that file to be deleted on exit using the atexit.register and a global list. The following should demonstrate the problem (in a different context).

import multiprocessing as multi

glob_data=[]
def func(a):
    glob_data.append(a)

map(func,range(10))
print glob_data  #[0,1,2,3,4 ... , 9]  Good.

p=multi.Pool(processes=8)
p.map(func,range(80))

print glob_data  #[0,1,2,3,4, ... , 9] Bad, glob_data wasn't updated.

Is there any way to have the global data updated?

Note that if you try out the above script, you probably shouldn't try it from the interactive interpreter since multiprocessing requires the module __main__ to be importable by child processes.

UPDATE

Added the global keyword in func doesn't help -- e.g.:

def func(a):  #Still doesn't work.
    global glob_data
    glob_data.append(a)
mgilson
  • 300,191
  • 65
  • 633
  • 696

2 Answers2

21

You need the list glob_data to be backed by shared memory, Multiprocessing's Manager gives you just that:

import multiprocessing as multi
from multiprocessing import Manager

manager = Manager()

glob_data = manager.list([])

def func(a):
    glob_data.append(a)

map(func,range(10))
print glob_data  # [0,1,2,3,4 ... , 9] Good.

p = multi.Pool(processes=8)
p.map(func,range(80))

print glob_data # Super Good.

For some background:

https://docs.python.org/3/library/multiprocessing.html#managers

synthomat
  • 774
  • 5
  • 11
Rafael Ferreira
  • 1,260
  • 8
  • 11
  • 1
    Cheers, this works perfectly for me. I should mention here that it works because the objects that I am appending to glob_data are immutable (ints in the example, strings in my actual application). If they objects being packed into the list are mutable, then care must be taken to re-add them to the list if they are changed. – mgilson Mar 28 '12 at 19:07
  • @RafaelFerreira Works well ! but the results aren't consistent, like in my case.. am using manager.dict(), values change each time I run my code. I see that lock should be applied but not sure. – Alekhya Vemavarapu May 16 '16 at 06:05
1

Have func return a tuple with the results you want from the processing and the thing you want to append to glob_data. Then, when the p.map has completed, you can extract the results from the first elements in the returned tuples and you can build glob_data from the second elements.

Glenn
  • 7,262
  • 1
  • 17
  • 23
  • Yeah, I thought about that ... My use-case is a little more complicated than that however. The temporary files that I want to delete are buried deep inside classes and since they are only temporary files, I prefer to keep them and their names as a private part of the class API (Implementation detail) ... – mgilson Mar 28 '12 at 16:43