0

I'm interested in saving the model created in Sklearn (e.g., EmpiricalCovariance, MinCovDet or OneClassSVM) and re-applying later on. I'm familiar with the option of saving a PKL file and joblib, however I would prefer to save the model explicitly and not a serialized python object. The main motivation for this is that it allows easily viewing the model parameters.

I found one reference to doing this: http://thiagomarzagao.com/2015/12/07/model-persistence-without-pickles/

The question is: Can I count on this working over time (i.e., new versions of sklearn)? Is this too much of a "hacky" solution?

Does anyone have experience doing this?

Thanks Jonathan

Jonathan
  • 101
  • 2
  • Why does this not work for you? http://stackoverflow.com/questions/10592605/save-classifier-to-disk-in-scikit-learn I don't think you lose the parameters if you pickle and reload...In the example you gave they went that route because the data was huge. Are you working with huge data? If so, sklearn may or may not work. Google has a persistent table for queries, and they aren't pickling it, but they're working at scale. Your main motivation for not pickling should be that it would be too slow, not that you want to access the parameters again. – flyingmeatball May 04 '16 at 17:40

1 Answers1

1

I don't think it's a hacky solution, a colleague has done a similar thing where he exports a model to be consumed by a scorer which is written in golang, and is much faster than the scikit-learn scorer. If you're worried about compatability with future versions of sklearn, you should consider using an environment manager like conda or virtualenv; in anycause this is just good software engineering practice and something you should start to get used to anyway.

maxymoo
  • 35,286
  • 11
  • 92
  • 119