9

I want to be able to use the Values module from the multiprocessing library to be able to keep track of data. As far as I know, when it comes to multiprocessing in Python, each process has it's own copy, so I can't edit global variables. I want to be able to use Values to solve this issue. Does anyone know how I can pass Values data into a pooled function?

from multiprocessing import Pool, Value
import itertools

arr = [2,6,8,7,4,2,5,6,2,4,7,8,5,2,7,4,2,5,6,2,4,7,8,5,2,9,3,2,0,1,5,7,2,8,9,3,2,]

def hello(g, data):
    data.value += 1

if __name__ == '__main__':
    data = Value('i', 0)
    func = partial(hello, data)
    p = Pool(processes=1)
    p.map(hello,itertools.izip(arr,itertools.repeat(data)))

    print data.value

Here is the runtime error i'm getting:

RuntimeError: Synchronized objects should only be shared between processes through inheritance

Does anyone know what I'm doing wrong?

user2313602
  • 303
  • 4
  • 10

2 Answers2

10

I don't know why, but there seems to be some issue using the Pool that you don't get if creating subprocesses manually. E.g. The following works:

from multiprocessing import Process, Value

arr = [1,2,3,4,5,6,7,8,9]


def hello(data, g):
    with data.get_lock():
        data.value += 1
    print id(data), g, data.value

if __name__ == '__main__':
    data = Value('i')
    print id(data)

    processes =  []
    for n in arr:
        p = Process(target=hello, args=(data, n))
        processes.append(p)
        p.start()

    for p in processes:
        p.join()

    print "sub process tasks completed"
    print data.value

However, if you do basically the same think using Pool, then you get an error "RuntimeError: Synchronized objects should only be shared between processes through inheritance". I have seen that error when using a pool before, and never fully got to the bottom of it.

An alternative to using Value that seems to work with Pool is to use a Manager to give you a 'shared' list:

from multiprocessing import Pool, Manager
from functools import partial


arr = [1,2,3,4,5,6,7,8,9]


def hello(data, g):
    data[0] += 1


if __name__ == '__main__':
    m = Manager()
    data = m.list([0])
    hello_data = partial(hello, data)
    p = Pool(processes=5)
    p.map(hello_data, arr)

    print data[0]
Tom Dalton
  • 6,122
  • 24
  • 35
  • Perfect! This is exactly what I was looking for! Looks like I'll have to end up using Manager instead! Thank you so much Tom! – user2313602 Aug 15 '15 at 22:43
  • Ref sharing the Value, this answer (http://stackoverflow.com/a/9931389/2372812) seems to do some workaround with the pool initialiser to make it work, but it seems like a fairly poor solution. Beware that using a manger results in potentially slow IPC as compared to 'real' shared memory. – Tom Dalton Aug 15 '15 at 22:46
-1

There is little need to use Values with Pool.map().

The central idea of a map is to apply a function to every item in a list or other iterator, gathering the return values in a list.

The idea behind Pool.map is basically the same but then spread out over multiple processes. In every worker process, the mapped function gets called with items from the iterator. The return values from the functions called in the worker processes are transported back to the parent process and gathered in a list which is eventually returned.


Alternatively, you could use Pool.imap_unordered, which starts returning results as soon as they are available instead of waiting until everything is finished. So you could tally the amount of returned results and use that to update the progress bar.

Roland Smith
  • 42,427
  • 3
  • 64
  • 94
  • 3
    What if I want to have, for example, a progress bar? I'd need a counter that all worker processes increment together. –  Aug 23 '18 at 08:44
  • @BarafuAlbino You can use a [`multiprocessing.Value`](https://docs.python.org/2/library/multiprocessing.html#multiprocessing.Value) for that. Note (from the linked documentation) that you'll have to acquire the lock before increasing the value! – Roland Smith Aug 23 '18 at 18:56
  • 3
    Sure. Except that Value does not work with multiprocessing.Pool.map, as the questions states. –  Aug 24 '18 at 21:53
  • @BarafuAlbino You should probably create the `Value` *before* `if __name__ == "__main__"`, so that child processes inherit it. And it might depend on the OS. MS-windows is kind of weird in that it doesn't have `fork`, which makes the implementation of `multiprocessing` easier on UNIX-like systems. – Roland Smith Aug 25 '18 at 11:35
  • @BarafuAlbino But `imap_unordered` is probably simpler. – Roland Smith Aug 25 '18 at 11:40
  • This is a ridiculous answer. Of course there are reasons to do this. For one, I need to keep track of state between items because the different workers are going back and forth modifying things. So yes, this is helpful. I hate answers that say "You don't need to know that" lol. Please delete. – user3413723 Aug 13 '23 at 20:26