Controlling number of files when saving MLlib model

Asked Apr 11 '19 at 04:41

Active Apr 11 '19 at 04:41

Viewed 33 times

When using model.save() to save an MLlib model to S3, is there a way to control the number of parquet files created? I know that if I were saving an RDD or data frame, I can control this by repartitioning the data, but cannot find any reference on saving a model.

asked Apr 11 '19 at 04:41

s-squared

In spark 2.4+ you can save the model as a single pmml file, see: https://stackoverflow.com/questions/31973116/apache-spark-mllib-model-file-format – Shaido Apr 11 '19 at 05:54
Thank you. Is it possible to save the model in multiple files, just with me being able to control that number? – s-squared Apr 11 '19 at 14:44
I'm not sure if that's possible or not, sorry. Just out of curiosity, why so you want a specific number of files when saving the model? – Shaido Apr 11 '19 at 15:36
To speed up saving very large models. – s-squared Apr 11 '19 at 19:25

Controlling number of files when saving MLlib model

0 Answers0