2

I am trying to parallelize some code that creates an instance of a Predictor class. This class contains many methods and attributes and looks something like this:

class Predictor:
    def__init__(...):

    def load_models(...):

    def load_data(...):
    
    def predict(...):

Where a DataManager class is called in the load_data method. When I add a __reduce__ function it only serializes the methods of the Predictor class but not the DataManager class. How can I serialize both within the Predictor class and have all methods of attributes contained?

Edit:

from generators import DataManager #class used to load data
class Predictor:


    def __init__(self,saved_model_dir,QS_params=None):
        '''
        

        Parameters
        ----------
        saved_model_dir : str
            Predictor directory.
        params : dict
            A dictionary of all predictor 
            algorithms
        '''
        self.saved_model_dir = saved_model_dir
        
        # attributes to save or serialize
        self.save_attr  = ['onehot_cutoffdate', 'name' ,'version_number',
         'batch_size','aggregate_data' ,'test_size'] 
        if(params!=None):
            self.params  = params

    def loadDataManager(self, name=None, batch_size=None, test_size=None,..)
        data_fp = self.saved_model_dir + '/path_to_data'
        self.dataManager = DataManager(self,data_fp,self.name,
                           self.batch_size,self.test_size,...)

How can I serialize all methods of the classes and their attributes (the list in the self.save_attr)?

I'm trying to use this with ray and have added a __reduce__ function such as below:

class Predictor:


    def __init__(self,saved_model_dir,QS_params=None):
        '''
        

        Parameters
        ----------
        saved_model_dir : str
            Predictor directory.
        params : dict
            A dictionary of all predictor 
            algorithms
        '''
        self.saved_model_dir = saved_model_dir
        
        # attributes to save or serialize
        self.save_attr  = ['onehot_cutoffdate', 'name' ,'version_number',
         'batch_size','aggregate_data' ,'test_size'] 
        if(params!=None):
            self.params  = params


    def __reduce__(self):
        '''This is necessary for serializing the class. Every attribute needed needs to be in the serializer line''' 
        deserializer = Predictor
        serializer = (self.saved_model_dir,self.save_attr,self.QS_params,)
        return deserializer, serializer

However when I try to use this class with Ray I will get an AtrributeError so it seems everything must be serialized.

>>> predictor.dataManager
<generators.DataManager object at 0x7fac0c00c290>
>>> ray.get(ray.put(predictor)).dataManager
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Predictor' object has no attribute 'dataManager'
Alex
  • 1,388
  • 1
  • 10
  • 19
  • Please clarify your issue, and provide a [mcve] if possible. Serialisation as per ``__reduce__`` is meant to store instance data; classes, functions and methods are by default serialised by fully qualified name without requiring any manual intervention. – MisterMiyagi Mar 24 '21 at 15:57
  • @MisterMiyagi sorry about that I'm new to stack, python, etc.. I've added more detail. Basically I'm trying to use this with ray or multiprocessing and I've tried using the `__reduce__` but it only contains the methods of the `Predictor` class and not the `DataManager`. Also is there a way to serialize all attributes? – michaelarman Mar 24 '21 at 16:31
  • Predictor *doesn't* have an attribute `dataManager`. It is only added when calling `loadDataManager`, and your code doesn't do that. – MisterMiyagi Mar 24 '21 at 17:18
  • I create an instance of the class called `predictor` and then call `loadDataManager` to store everything needed so I guess the instance has the attribute `dataManager` . Would you happen to know a workaround such that I can use the class with a multiprocessing library? For example I think I would need it to have `dir(predictor) == dir(ray.put(predictor)). – michaelarman Mar 24 '21 at 17:26
  • @MisterMiyagi I call methods on the `predictor` instance to load models, data, etc. which are needed for functions that I would like to call in parallel e.g. `predictor.predict(x,y)` – michaelarman Mar 24 '21 at 17:40

0 Answers0