I’m working with a python class (let’s call it class1) that has another library (we will call library2) installed as a dependency. library2 has a specific class that is instantiated which we will call class2. Within class2’s constructor, other classes are instantiated.
class2 has a method within it (let’s call it method2) that contains a loop that triggers another method in the same class, which in turn triggers methods of the class objects instantiated in the constructor. I'm not sure if these particulars even matter but I thought I would note them. Here is an attempt to visualize the situation:
class1 ->
class2 -> (library dependency of class 1)
method2 -> (multiprocessing needed here)
external class methods
I would like to implement multiprocessing within a loop in method2, and I am successful in implementing this in various ways if I execute the module directly.
However, if the class/module is triggered from an upstream process I am running into the dreaded TypeError: can't pickle _hashlib.HASH objects
. I’ve tried various workarounds in an attempt to resolve this, some of which include:
- Implementing a top-level wrapper method within class2, within which I instantiate class2, similar to the solution given here.
- Implementing a top-level method in class2 that converts the class2 method to base64 and attempts to map the converted function, similar to the answer here.
As mentioned, these don't show the error if I run the class directly, but rather when the class is triggered by an upstream module as a dependency.
I was able to resolve the error by using pathos multiprocessing in a top-level method in class2 and calling that from within class2's method2 as in the code below (inspired by the answer here), but this resulted in worse performance than the original code without multiprocessing.
import pathos.multiprocessing as mp
class Class2:
...
def method2(list1, list2):
for i in range(0 to n):
results = run_func_pathos(list1, list2)
def run_func_pathos(param_list1, param_list2):
class2 = Class2()
pool = mp.ProcessingPool(os.cpu_count())
results = pool.map(class2.method2, param_list1, param_list2)
return results
How can one leverage multiprocessing in a downstream class? Is this even possible?