I am trying to parallelize some code that creates an instance of a Predictor
class. This class contains
many methods and attributes and looks something like this:
class Predictor:
def__init__(...):
def load_models(...):
def load_data(...):
def predict(...):
Where a DataManager
class is called in the load_data
method. When I add a __reduce__
function it only serializes the methods of the Predictor class but not the DataManager
class. How can I serialize both within the Predictor
class and have all methods of attributes contained?
Edit:
from generators import DataManager #class used to load data
class Predictor:
def __init__(self,saved_model_dir,QS_params=None):
'''
Parameters
----------
saved_model_dir : str
Predictor directory.
params : dict
A dictionary of all predictor
algorithms
'''
self.saved_model_dir = saved_model_dir
# attributes to save or serialize
self.save_attr = ['onehot_cutoffdate', 'name' ,'version_number',
'batch_size','aggregate_data' ,'test_size']
if(params!=None):
self.params = params
def loadDataManager(self, name=None, batch_size=None, test_size=None,..)
data_fp = self.saved_model_dir + '/path_to_data'
self.dataManager = DataManager(self,data_fp,self.name,
self.batch_size,self.test_size,...)
How can I serialize all methods of the classes and their attributes (the list in the self.save_attr)?
I'm trying to use this with ray and have added a __reduce__
function such as below:
class Predictor:
def __init__(self,saved_model_dir,QS_params=None):
'''
Parameters
----------
saved_model_dir : str
Predictor directory.
params : dict
A dictionary of all predictor
algorithms
'''
self.saved_model_dir = saved_model_dir
# attributes to save or serialize
self.save_attr = ['onehot_cutoffdate', 'name' ,'version_number',
'batch_size','aggregate_data' ,'test_size']
if(params!=None):
self.params = params
def __reduce__(self):
'''This is necessary for serializing the class. Every attribute needed needs to be in the serializer line'''
deserializer = Predictor
serializer = (self.saved_model_dir,self.save_attr,self.QS_params,)
return deserializer, serializer
However when I try to use this class with Ray I will get an AtrributeError
so it seems everything must be serialized.
>>> predictor.dataManager
<generators.DataManager object at 0x7fac0c00c290>
>>> ray.get(ray.put(predictor)).dataManager
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'Predictor' object has no attribute 'dataManager'