I wrote this code in Spark ML
import org.apache.spark.ml.classification.LogisticRegression
import org.apache.spark.ml.Pipeline
val lr = new LogisticRegression()
val pipeline = new Pipeline()
.setStages(Array(fooIndexer, fooHotEncoder, assembler, lr))
val model = pipeline.fit(training)
This code takes a long time to run. Is it possible that after running pipeline.fit I save the model on HDFS so that I don't have to run it again and again?
Edit: Also, how to load it back from HDFS when I have to apply transform
on the model so that I can make predictions.