I want to perform grid search on my Random Forest Model in Apache Spark. But I am not able to find an example to do so. Is there any example on sample data where I can do hyper parameter tuning using Grid Search?
Asked
Active
Viewed 4,644 times
0
-
3Possible duplicate of [How to cross validate RandomForest model?](https://stackoverflow.com/questions/32769573/how-to-cross-validate-randomforest-model) – 10465355 Jan 15 '19 at 22:02
1 Answers
1
from pyspark.ml import Pipeline
from pyspark.ml.classification import RandomForestClassifier
from pyspark.ml.evaluation import BinaryClassificationEvaluator
from pyspark.ml.tuning import CrossValidator, ParamGridBuilder
rf = RandomForestClassifier(labelCol="indexedLabel", featuresCol="indexedFeatures", numTrees=10)
pipeline = Pipeline(stages=[rf])
paramGrid = ParamGridBuilder().addGrid(rf.numTrees, [10, 30]).build()
crossval = CrossValidator(estimator=pipeline,
estimatorParamMaps=paramGrid,
evaluator=BinaryClassificationEvaluator(),
numFolds=2)
cvModel = crossval.fit(training_df)
hyperparameters and grid are defined in addGrid method

o11306650
- 11
- 4