I trained a classification model in Apache Spark (using pyspark
). I stored the model in an object, LogisticRegressionModel
. Now, I want to make predictions on new data. I would like to store the model, and read it back into a new program in order to make the predictions. Any idea how to store the model? I'm thinking of maybe pickle, but I'm a newbie to both python and Spark, so I'd like to hear what the community thinks.
Asked
Active
Viewed 2.7k times
15
-
This is slightly related to this question [Saving a ML Model for future usage](http://stackoverflow.com/questions/33027767/save-ml-model-for-future-usage), the difference is that in this question you ask to use the `MLLib` – Alberto Bonsanto Dec 14 '15 at 15:34
1 Answers
16
You can save your model by using the save method of mllib
models.
# let lrm be a LogisticRegression Model
lrm.save(sc, "lrm_model.model")
After storing it you can load it in another application.
sameModel = LogisticRegressionModel.load(sc, "lrm_model.model")
As @zero323 stated before, there is another way to achieve this, and is by using the Predictive Model Markup Language (PMML).
is an XML-based file format developed by the Data Mining Group to provide a way for applications to describe and exchange models produced by data mining and machine learning algorithms.

Alberto Bonsanto
- 17,556
- 10
- 64
- 93