0

I have a dictionary in which I collect ML models, that I built with a dataclass as follows:

  @dataclass(frozen=True, order=True)
    class Model:
        data_sample: str
        predictive_model: object
        predictions: pd.DataFrame
        binary: object
        type: str
        inputs: list
        output: str
        explain: bool

        def to_dict(self):
            return asdict(self)

I produce multiple models and use the dataclass to validate the inputs for a single, trained model. I cast this class as a dictionary to an ML list:

ML.append(model.to_dict())

The objects for binary and predictive_model are models (python classes) that come from libraries like scikit-learn, TPOT, SciPy and so on. One should assume that there is a lot of inheritance happening in these objects. I am struggling to make this list portable to another environment. My core idea of making this portable is to use libs like joblib, dill or pickle to .dump the dictionary in the runtime that trains the models, and use a .load method to load the dictionary. When I do this, I notice that there is a ModuleNotFoundError: No module named ... error. I already found this to be a common problem, and that there are answers around this error here: Python pickling after changing a module's directory

My question is: Is there a better way to "export" my dictionary? Preferably in such a way that it copies everything that it needs so that I can run this elsewhere without needing to manage any imports?

I get the feeling that pickling might not be what I need..

Pieter Geelen
  • 528
  • 2
  • 11
  • Can you provide example on how you are deserializing the models? Are you importing the necessary sci libraries in the same file that you are deserializing? – drum Apr 18 '23 at 13:44

0 Answers0