0

I want to perform grid search on my Random Forest Model in Apache Spark. But I am not able to find an example to do so. Is there any example on sample data where I can do hyper parameter tuning using Grid Search?

Regressor
  • 1,843
  • 4
  • 27
  • 67
  • 3
    Possible duplicate of [How to cross validate RandomForest model?](https://stackoverflow.com/questions/32769573/how-to-cross-validate-randomforest-model) – 10465355 Jan 15 '19 at 22:02

1 Answers1

1
from pyspark.ml import Pipeline
from pyspark.ml.classification import RandomForestClassifier
from pyspark.ml.evaluation import BinaryClassificationEvaluator
from pyspark.ml.tuning import CrossValidator, ParamGridBuilder


rf = RandomForestClassifier(labelCol="indexedLabel", featuresCol="indexedFeatures", numTrees=10)
pipeline = Pipeline(stages=[rf])
paramGrid = ParamGridBuilder().addGrid(rf.numTrees, [10, 30]).build()

crossval = CrossValidator(estimator=pipeline,
                          estimatorParamMaps=paramGrid,
                          evaluator=BinaryClassificationEvaluator(),
                          numFolds=2) 

cvModel = crossval.fit(training_df)

hyperparameters and grid are defined in addGrid method

o11306650
  • 11
  • 4