54

First question is what is the difference between Value and Manager().Value?

Second, is it possible to share integer variable without using Value? Below is my sample code. What I want is getting a dict with a value of integer, not Value. What I did is just change it all after the process. Is there any easier way?

from multiprocessing import Process, Manager

def f(n):
    n.value += 1

if __name__ == '__main__':
    d = {}
    p = []

    for i in range(5):
        d[i] = Manager().Value('i',0)
        p.append(Process(target=f, args=(d[i],)))
        p[i].start()

    for q in p:
        q.join()

    for i in d:
        d[i] = d[i].value

    print d
Benyamin Jafari
  • 27,880
  • 26
  • 135
  • 150
user2435611
  • 1,093
  • 1
  • 12
  • 18

1 Answers1

69

When you use Value you get a ctypes object in shared memory that by default is synchronized using RLock. When you use Manager you get a SynManager object that controls a server process which allows object values to be manipulated by other processes. You can create multiple proxies using the same manager; there is no need to create a new manager in your loop:

manager = Manager()
for i in range(5):
    new_value = manager.Value('i', 0)

The Manager can be shared across computers, while Value is limited to one computer. Value will be faster (run the below code to see), so I think you should use that unless you need to support arbitrary objects or access them over a network.

import time
from multiprocessing import Process, Manager, Value


def foo(data, name=''):
    print(type(data), data.value, name)
    data.value += 1


if __name__ == "__main__":
    manager = Manager()
    x = manager.Value('i', 0)
    y = Value('i', 0)

    for i in range(5):
        Process(target=foo, args=(x, 'x')).start()
        Process(target=foo, args=(y, 'y')).start()

    print('Before waiting: ')
    print('x = {0}'.format(x.value))
    print('y = {0}'.format(y.value))

    time.sleep(5.0)
    print('After waiting: ')
    print('x = {0}'.format(x.value))
    print('y = {0}'.format(y.value))

To summarize:

  1. Use Manager to create multiple shared objects, including dicts and lists. Use Manager to share data across computers on a network.
  2. Use Value or Array when it is not necessary to share information across a network and the types in ctypes are sufficient for your needs.
  3. Value is faster than Manager.

Warning

By the way, sharing data across processes/threads should be avoided if possible. The code above will probably run as expected, but increase the time it takes to execute foo and things will get weird. Compare the above with:

def foo(data, name=''):
    print type(data), data.value, name
    for j in range(1000):
        data.value += 1

You'll need a Lock to make this work correctly.

I am not especially knowledgable about all of this, so maybe someone else will come along and offer more insight. I figured I would contribute an answer since the question was not getting attention.

starball
  • 20,030
  • 7
  • 43
  • 238
ChrisP
  • 5,812
  • 1
  • 33
  • 36
  • can we add any value to Array? I can't append any value to Array. – user2435611 Jul 02 '13 at 09:08
  • 2
    @user2435611, [`Array`](http://docs.python.org/2/library/multiprocessing.html#multiprocessing.Array) will give you a shared ctypes array. You need to decide what type of data you are storing beforehand, and supply a [type code](http://docs.python.org/2/library/array.html#module-array). For example, `a = Array('c', 10)` creates an array of one-character strings of length 10. New entries can be added to the array like so: `a[0] = 'b'`. You cannot add *any* value to an array, see [the list of type codes](http://docs.python.org/2/library/array.html#module-array). – ChrisP Jul 02 '13 at 13:20
  • So we should decide the size of array beforehand and can't expand it? if so, it's better for me to use manager.list(). Thanks for help :) – user2435611 Jul 02 '13 at 18:38
  • @user2435611: Yes, I think that's right. The `multiprocessing.Array` is allocated memory at the time of creation and unlike `array.array` cannot be expanded. Use `manager.list` if you really have no idea how much space you need, but you might want to experiment with allocating an `Array` with some extra space if you can find an upper-bound on the size. I hope that helps. – ChrisP Jul 02 '13 at 23:52
  • 3
    @ChrisP i'm late to the party, but how would you recommend sharing a simple int variable across processes on one machine? I use multiprocessing for IO bound workers (until I learn async), and would like to have a counter they share so I know how many iterations they've gone through. Recs on how best to implement? – Travis Leleu Oct 15 '14 at 18:44
  • 1
    @TravisLeleu did you find a solution for this? – Connor Jul 02 '18 at 17:35
  • How about sharing data by adding it to other module and accessing that module from all processes something explained in [this](https://stackoverflow.com/questions/52701273/sharing-common-and-worker-specific-information-across-workers-and-modules-in-pyt)? This does not at all use any features of multiprocessing module, but plain python. Is this ok, if I dont want to have synchronized access? – MsA Nov 01 '18 at 10:43
  • Can you just comment if we can share any datatype such as pandas dataframe using this approach? – MsA Nov 05 '18 at 09:19
  • 1
    You should use a `Lock` to protect `Value`, see https://eli.thegreenplace.net/2012/01/04/shared-counter-with-pythons-multiprocessing – soulmachine Dec 10 '18 at 08:50
  • 1
    it seems that it does not run fo py3 – Jirka Feb 14 '19 at 15:38
  • It doesn't work for python3.7: `AttributeError: 'ForkAwareLocal' object has no attribute 'connection'` – pdaawr Jun 02 '20 at 13:33