When using model.save() to save an MLlib model to S3, is there a way to control the number of parquet files created? I know that if I were saving an RDD or data frame, I can control this by repartitioning the data, but cannot find any reference on saving a model.
Asked
Active
Viewed 33 times
1
-
In spark 2.4+ you can save the model as a single pmml file, see: https://stackoverflow.com/questions/31973116/apache-spark-mllib-model-file-format – Shaido Apr 11 '19 at 05:54
-
Thank you. Is it possible to save the model in multiple files, just with me being able to control that number? – s-squared Apr 11 '19 at 14:44
-
I'm not sure if that's possible or not, sorry. Just out of curiosity, why so you want a specific number of files when saving the model? – Shaido Apr 11 '19 at 15:36
-
To speed up saving very large models. – s-squared Apr 11 '19 at 19:25