0

I'm using the multiprocessing module from the pathos library to parallelise a heavy process defined within a class. My class needs to have an Enum instance attribute defined and, unfortunately, this is breaking the multiprocessing fonctionnality. Here's a minimal example for how to replicate this error (I'm running on Python 3.10.8 and I don't have the possibility to run Python 3.11.x at work):

from enum import Enum

from pathos.multiprocessing import ProcessingPool


class MyClass:
    def __init__(self, group_dict):
        self.group_dict = group_dict
        self.tags_emum = Enum(
            value="MyEnum",
            names={v.upper(): v for v in self.group_dict.keys()},
            type=str,
        )

    def fnc1(self, names_list):
        pool = ProcessingPool(nodes=2)
        result = pool.map(self.fnc2, names_list)

        return result

    def fnc2(self, name):
        return len(name)


if __name__ == "__main__":
    inst = MyClass(group_dict={"key1": "val1", "key2": "val2"})
    print(inst.fnc1(names_list=["StackOverflow", "Python", "Question"]))

Running this code will raise the following PicklingError:

_pickle.PicklingError: Can't pickle <enum 'MyEnum'>: it's not found as __main__.MyEnum

Removing the part where self.tags_enum is defined will make the code run just fine and produce the expected result: [13, 6, 8].

Given the above, I have the following two-part question:

  • First, as I'm fairly new with mutltiprocessing, I would like to understand why this is failing.
  • Then, I'm also looking for ways to fix this error. I should note that it is important that I have the tags_enum instance attribute set this way. Though it may not look important at all in this toy example, it is important in the real use case I'm working on.
glpsx
  • 587
  • 1
  • 7
  • 21
  • 1
    It's failing because `multiprocessing` requires pickling python objects and sending them between processes. The way you defined your enum, it cannot be pickled. Hence the error. I don't really understand the value of the dynamically defined enum here, so my inclination is simply to suggest not using an enum. Can you *actually explain its purpose*? – juanpa.arrivillaga May 03 '23 at 22:12
  • 1
    In general, [pickling a dynamically created class is hard](https://stackoverflow.com/questions/11658511/pickling-dynamically-generated-classes). I think you could do this by subclassing Enum, and defining `__reduce__` on that subclass, but it's not the easiest way to do it. The easiest way to do it is probably to change `self.tags_emum` to a dict, which will pickle without any problems. – Nick ODell May 03 '23 at 22:19

1 Answers1

1

I solved my issue by setting tags_emum as a property instead of a instance attribute.

Though this particular solution may not work for other specific use cases, I'm posting it anyway as it may help some people.

@property
def tags_enum(self):
    return Enum(
        value="MyEnum",
        names={v.upper(): v for v in self.group_dict.keys()},
        type=str,
    )
glpsx
  • 587
  • 1
  • 7
  • 21
  • 1
    By using `property` you are creating a new enum everytime `.tags_enum` is accessed (with the same names/values) -- this will be confusing and, at some point, will lead to bugs. Use `functools.cached_property` -- it calls the function once and then saves the result, returning that result on all future `.tags_enum` accesses. – Ethan Furman May 04 '23 at 17:49
  • Thank you for this very useful and informative suggestion, @EthanFurman! – glpsx May 04 '23 at 18:27