My platform is spark 2.1.0, using python language.
Now I have about 100 random forest multiclassification models ,I have saved them in the HDFS.There are 100 datasets saved in the HDFS too. I want to predict the dataset using corresponding model.If the models and datasets are cache in memory,the predict will be more than 10 times faster.
But I do not know how to cache models because the model is not RDD or Dataframe.
Thanks!