1

My platform is spark 2.1.0, 8 nodes cluster, using python language.

Now I have about 100 random forest multiclassification models ,I have saved them in the HDFS.There are 100 datasets saved in the HDFS too. I want to predict the dataset using corresponding model parallely.

I use a loop to iterate the 100 dataset.In each iteration,I catch the corresponding model to predict the data. But the cost time shows that it is not in parallel.

I do not know how to do.

Thanks!

Guanglin Zhou
  • 21
  • 1
  • 5

0 Answers0