4

To start with, here is some code that works

from multiprocessing import Pool, Manager
import random

manager = Manager()
dct = manager.dict()

def do_thing(n):
    for i in range(10_000_000):
        i += 1
    dct[n] = random.randint(0, 9)

with Pool(2) as pool:
    pool.map(do_thing, range(10))

Now if I try to make a class out of this:

from multiprocessing import Pool, Manager
import random


class SomeClass:
    def __init__(self):
        self.manager = Manager()
        self.dct = self.manager.dict()

    def __call__(self):
        with Pool(2) as pool:
            pool.map(self.do_thing, range(10))

    def do_thing(self, n):
        for i in range(10_000_000):
            i += 1
        self.dct[n] = random.randint(0, 9)


if __name__ == '__main__':
    inst = SomeClass()
    inst()

I run into: TypeError: Pickling an AuthenticationString object is disallowed for security reasons. Now from here, I get the hint that Python is trying to pickle the Manager which as I understand has its own dedicated process, and processes can't be pickled because they contain an AuthenticationString.

I don't know enough about how forking works (I'm on Linux, so I understand this is the default method for starting new processes) to understand exactly why the Manager instance needs to be pickled.

So here are my questions:

  1. Why is this happening?
  2. How can I use a Manager when doing multiprocessing within a class? PS: I want to be able to import SomeClass from this module.
  3. Is what I'm asking for unreasonable or unconventional?

PS: I know I can do this exact snippet without the Manager by exploiting the fact that pool.map will return things in order, so something like this: res = pool.map(self.do_thing, range(10)) then dct = {k: v for k, v in zip(range(10), res)}. But that's besides the point of the question.

Alexander Soare
  • 2,825
  • 3
  • 25
  • 53

1 Answers1

5

To answer your questions:

Q1 - Why is this happening?

Each worker process created by the Pool.map() needs to execute the instance method self.do_thing(). In order to do that Python pickles the instance and passes it to the subprocess (which unpickles it). If each instance has a Manager it will be a problem because they're not pickleable. Part of the unpickling process involves importing the module that defines the class and restoring the instance's attributes (which were also pickled).

Q2 - How to fix it

You can avoid the problem by having the class create its own class-level Manager (shared by all instances of the class). Here the __init__() method creates the manager class attribute the first time an instance is created and from that point on, further instances will reuse this — it's sometimes called "lazy initialization"

from multiprocessing import Pool, Manager
import random


class SomeClass:
    def __init__(self):
        # Lazy creation of class attribute.
        try:
            manager = getattr(type(self), 'manager')
        except AttributeError:
            manager = type(self).manager = Manager()
        self.dct = manager.dict()

    def __call__(self):
        with Pool(2) as pool:
            pool.map(self.do_thing, range(10))
        print('done')

    def do_thing(self, n):
        for i in range(10_000_000):
            i += 1
        self.dct[n] = random.randint(0, 9)


if __name__ == '__main__':
    inst = SomeClass()
    inst()

Q3 - Is this a reasonable thing to do?

In my opinion, yes.

martineau
  • 119,623
  • 25
  • 170
  • 301
  • Thanks! I guess I'm a little disappointed at the answer though? (It's not you, it's me :P) I mean, now this requires that the user of the class needs to get involved in the inner workings by passing in a manager. Now I'm wondering whether it makes sense to have some wrapper around the class which sets up the manager, and the user calls the wrapper and gets to be oblivious to the manager issue. – Alexander Soare Mar 25 '21 at 06:49
  • 1
    I think I can rectify that… – martineau Mar 25 '21 at 06:56
  • Okay that is some serious wizardry. Maybe my `if __name__ ` bit threw everyone off though. I put it there to let you easily copy paste the code and try it. What I would really do is `from my_module import Some class` in which case `__name__` would not be `"__main__"`. If you're going to now just remove the `if __name__` line while leaving the `Manager` in place, I'll need to re-evaluate my understanding of Python haha. By the way, does it make sense to share it with ALL instances of the class? What if someone starts 100 instances. Maybe it would be silly to have one process manage all that – Alexander Soare Mar 25 '21 at 07:08
  • 1
    This is the second time you've rejected my post which technically answers your question by adding more constraints not mentioned in it when you asked it. I suspect there are other ways for the class to detect whether it should create the attribute, and yes it makes sense to share a class-level manager in this case anyway (at least as far as I can tell from what you've revealed thus far). – martineau Mar 25 '21 at 07:57
  • I think your assessment is not fair. I never rejected your post and afaik the comments section is designed to help bridge the gap between the Q and A, because neither will be perfect on the first go. Also, I padded my question 2 with questions 1 and 3 for a reason, indicating I was looking for some level of understanding rather than just the code changes required to make it work. In any case, I've upvoted your answer as it provides value, and I guess I forgot to do that right away. – Alexander Soare Mar 25 '21 at 08:17
  • Your `if __name__` didn't throw me "off", it's the proper way to do `multiprocessing` (see "**Safe importing of main module**" subsection in the [documentation](https://docs.python.org/3/library/multiprocessing.html#the-spawn-and-forkserver-start-methods). I used it in a somewhat unusual place to prevent the `Manager` from being part of the class except for when it was being used by the main process (as opposed to a subprocess). I've modified my answer to do that another way so the class could be put into a separate module if desired. – martineau Mar 25 '21 at 15:50
  • 1
    Well, does it now do what you want? – martineau Mar 26 '21 at 01:05
  • I see that it's only creating the manager if an instance of the class is created. Then with any subsequent instances created it uses the existing manager. Nice. Thanks! – Alexander Soare Mar 26 '21 at 08:56
  • Yes, that's the essence of what's going on. The idea—all along—has been to avoid having a `Manager` be an _instance_ attribute which makes Python feel the need to try to pickle it when pickling instances. Earlier versions used `__name__` as a simple way to detect when the main process was using the class (as opposed to it having been imported by a subprocess that wouldn't ever create instances of it). The latest revision effective achieves the goal differently as you've described. Consider accepting my answer. – martineau Mar 26 '21 at 10:05
  • Nice! Okay, and I suppose the attempt to pickle all instances of the class comes from when you put `pool.map(self.func,` so because there is a `self` the fork method of starting a process tries to pickle all of `self`. Is that right? If so, that answers my Q1. Finally, for my Q3, is what I'm doing a reasonable thing? Now that you've shown me this, is it just an engineered solution to an unreasonable request? – Alexander Soare Mar 26 '21 at 10:14
  • If you want, please incorporate the extra color that you've provided into the answer. Then it will be answering my question in full, and I can accept it. – Alexander Soare Mar 26 '21 at 10:15
  • 1
    Each worker process created by the Pool.map() needs to execute the instance method `self.do_thing()`. In order to do that Python pickles the instance and passes it to the subprocess (which unpickles it). If each instance has a `Manager` it will be a problem because they're not pickleable. Part of the unpickling process involves importing the module that defines the class and restoring the instance's attributes (which were also pickled). Initially I moved the `Manager` instance out of the class, but you didn't like having to pass it as an argument when calling the class to create instances. … – martineau Mar 26 '21 at 11:26
  • 1
    … To get around that I modified the class to construct a `Manager` whenever the class was defined in the `__main__` process by checking `__name__`. But you didn't like that because it meant you couldn't put the class in another module. So I again modified the class to what it now is to allow that. NO I will not incorporate all this "color" into my answer. It's not necessary because anyone sufficiently interested can read all about the "why" down here in the comments. What you want to do seems reasonable, IMO. – martineau Mar 26 '21 at 11:35