0

I have a problem where I need to call an instance function of a class in parallel and count the number of times it has been called so each call has a unique identifier (to be used to store results in a unique location).

Here is a question with solutions for what I want but in Java

Here is a minimal example:

para2.py, which sets up all the instance-method pickling stuff (less relevant):

from copy_reg import pickle
from types import MethodType
from para import func

def _pickle_method(method):
    return _unpickle_method, (method.im_func.__name__, method.im_self, method.im_class)

def _unpickle_method(func_name, obj, cls):
    return cls.__dict__[func_name].__get__(obj, cls)

pickle(MethodType, _pickle_method, _unpickle_method)

func()

And now para.py contains:

from sklearn.externals.joblib import Parallel, delayed
from math import sqrt
from multiprocessing import Lock

class Thing(object):

    COUNT = 0
    lock = Lock()

    def objFn(self, x):
        with Thing.lock:
            mecount = Thing.COUNT
            Thing.COUNT += 1

        print mecount

        n=0
        while n < 10000000:# add a little delay for consistency
            n += 1
        return sqrt(x)

def func()
    thing = Thing()

    y = Parallel(n_jobs=4)(delayed(thing.objFn)(i**2) for i in range(10))
    print y

Now running python para2.py in a terminal prints

0
0
0
0
1
1
1
1
2
2
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]

I need those numbers on the vertical to count 0 to 9, but it appears that all four processes are still accessing and trying to update COUNT concurrently. How can I make this do what I want?

Pavel Komarov
  • 1,153
  • 12
  • 26
  • I found a possible solution based on [this post](https://eli.thegreenplace.net/2012/01/04/shared-counter-with-pythons-multiprocessing). Making `Thing.COUNT` a `multiprocessing.RawValue` and then setting `mecount = Thing.COUNT.value` and incrementing `Thing.COUNT.value += 1` causes `mecount` to have the correct value. In the unmodified code in the question above it's as if even though `Thing.lock` and `Thing.COUNT` are the same objects across all processes (checked via `id()`), the COUNT is not in shared memory. Can someone explain fully? – Pavel Komarov Jan 17 '18 at 23:10

1 Answers1

1

With multiprocessing, python forks your code and creates a child process where it runs the code. In doing this it creates a copy of the class for each child process. It doesn't share the code/data. You can debug this a bit by placing print comments such as...

print multiprocessing.current_process().name

in your constructor and in your objFn to see what's running where and what it's value is.

In order to share data between processes you have to something designed for this from the multiprocessing library. These are the Value and Array objects. These use shared memory and because of that are generally limited to integral ctypes, not just any generic python object.

bivouac0
  • 2,494
  • 1
  • 13
  • 28
  • The problem here is I am using spawning processes rather than threads: The Java example I provided and many of the other examples I was able to find focus on this threading context. Because threads share a memory space, there is not this extra complication of needing a special sort of value in shared memory. Because processes do not share a memory space, updating the COUNT in one space has no effect on the COUNT in other memory spaces, which are all individually pickled copies. So even though locking is working in the code as given, there is no benefit. – Pavel Komarov Jan 17 '18 at 23:31