I'm using the multiprocessing module from the pathos library to parallelise a heavy process defined within a class. My class needs to have an Enum instance attribute defined and, unfortunately, this is breaking the multiprocessing fonctionnality. Here's a minimal example for how to replicate this error (I'm running on Python 3.10.8 and I don't have the possibility to run Python 3.11.x at work):
from enum import Enum
from pathos.multiprocessing import ProcessingPool
class MyClass:
def __init__(self, group_dict):
self.group_dict = group_dict
self.tags_emum = Enum(
value="MyEnum",
names={v.upper(): v for v in self.group_dict.keys()},
type=str,
)
def fnc1(self, names_list):
pool = ProcessingPool(nodes=2)
result = pool.map(self.fnc2, names_list)
return result
def fnc2(self, name):
return len(name)
if __name__ == "__main__":
inst = MyClass(group_dict={"key1": "val1", "key2": "val2"})
print(inst.fnc1(names_list=["StackOverflow", "Python", "Question"]))
Running this code will raise the following PicklingError
:
_pickle.PicklingError: Can't pickle <enum 'MyEnum'>: it's not found as __main__.MyEnum
Removing the part where self.tags_enum
is defined will make the code run just fine and produce the expected result: [13, 6, 8]
.
Given the above, I have the following two-part question:
- First, as I'm fairly new with mutltiprocessing, I would like to understand why this is failing.
- Then, I'm also looking for ways to fix this error. I should note that it is important that I have the
tags_enum
instance attribute set this way. Though it may not look important at all in this toy example, it is important in the real use case I'm working on.