how to predict the test data parallely using random forest model in spark

Asked Jun 29 '18 at 01:29

Active Jun 29 '18 at 01:29

Viewed 17 times

My platform is spark 2.1.0, 8 nodes cluster, using python language.

Now I have about 100 random forest multiclassification models ,I have saved them in the HDFS.There are 100 datasets saved in the HDFS too. I want to predict the dataset using corresponding model parallely.

I use a loop to iterate the 100 dataset.In each iteration,I catch the corresponding model to predict the data. But the cost time shows that it is not in parallel.

I do not know how to do.

Thanks!

asked Jun 29 '18 at 01:29

Guanglin Zhou

how to predict the test data parallely using random forest model in spark

0 Answers0