0

I’m working with a python class (let’s call it class1) that has another library (we will call library2) installed as a dependency. library2 has a specific class that is instantiated which we will call class2. Within class2’s constructor, other classes are instantiated.

class2 has a method within it (let’s call it method2) that contains a loop that triggers another method in the same class, which in turn triggers methods of the class objects instantiated in the constructor. I'm not sure if these particulars even matter but I thought I would note them. Here is an attempt to visualize the situation:

class1 -> 
  class2 -> (library dependency of class 1)
     method2 -> (multiprocessing needed here)
       external class methods

I would like to implement multiprocessing within a loop in method2, and I am successful in implementing this in various ways if I execute the module directly.

However, if the class/module is triggered from an upstream process I am running into the dreaded TypeError: can't pickle _hashlib.HASH objects. I’ve tried various workarounds in an attempt to resolve this, some of which include:

  1. Implementing a top-level wrapper method within class2, within which I instantiate class2, similar to the solution given here.
  2. Implementing a top-level method in class2 that converts the class2 method to base64 and attempts to map the converted function, similar to the answer here.

As mentioned, these don't show the error if I run the class directly, but rather when the class is triggered by an upstream module as a dependency.

I was able to resolve the error by using pathos multiprocessing in a top-level method in class2 and calling that from within class2's method2 as in the code below (inspired by the answer here), but this resulted in worse performance than the original code without multiprocessing.

import pathos.multiprocessing as mp

class Class2:
    ...
    def method2(list1, list2):
        for i in range(0 to n):
            results = run_func_pathos(list1, list2)

def run_func_pathos(param_list1, param_list2):
    
    class2 = Class2()
    pool = mp.ProcessingPool(os.cpu_count())
    results = pool.map(class2.method2, param_list1, param_list2)

    return results

How can one leverage multiprocessing in a downstream class? Is this even possible?

aaronbriel
  • 301
  • 3
  • 11
  • I'm the `pathos` author. It doesn't look like you are cleaning up your pool (i.e. close/join/delete). Spawning multiple pools but not joining the processes can lead to memory issues. Also, it may just be that your target function is faster then the overhead in opening a pool and spawning the processes. You can try passing in the map instance in the init method (i.e. generating the pool outside the class), and/or using one of the other maps (say from a ThreadPool). – Mike McKerns Mar 02 '21 at 01:27
  • Thank you for the quick reply! It's not the case that the target function is faster than the overhead in opening a pool and spawning the processes. What would be the appropriate approach to cleaning up the pool? Calling `pool.close()` after extracting the results it throws `Pool not running`. If I call pool.join() I see `...multiprocess/pool.py\", line 545, in join\n assert self._state in (CLOSE, TERMINATE)\nAssertionError`. When I use `pool.clear()` performance remains dreadful. – aaronbriel Mar 02 '21 at 16:08
  • Regarding your advice on the map instance, I'm going to see if passing in the map instance is possible. If not, do you have any guidance on using another map from a ThreadPool? – aaronbriel Mar 02 '21 at 16:11
  • The cleanup on a pool is generally `close()`, `join()`, `clear()` in sequence, right after the call to `map`. The `map` from `ThreadPool` has an identical interface to `ProcessPool`. Regarding passing a `map` instance, that is generally what I do in my own code. – Mike McKerns Mar 03 '21 at 09:58
  • Unfortunately I cannot pass a map instance to the class as it generates the sets of variables to the function I'm attempting to parallelize. I implemented the suggested cleanup approach of calling `close()`, `join()`, and `clear()` and I'm still observing sub-par performance. Any other ideas on how I can approach this? – aaronbriel Mar 03 '21 at 20:01
  • I'm not sure I understand why you can't pass the map instance, or the pool instance. Either enables you to initialize and cleanup in `__main__`, and call the `map` within the method. It'll be hard to be more helpful without seeing code that demonstrates the issue you are seeing. – Mike McKerns Mar 04 '21 at 10:31
  • I can't pass the map or pool instance due to constraints in the underlying architecture I'm working with. Unfortunately it is proprietary code so I cannot post it, and replicating the issue with dummy code is outside of my time at this point. Thank you for the insights though, it is very much appreciated! – aaronbriel Apr 28 '21 at 14:16

0 Answers0