1

I am using cross validation for model and parameter selection in Spark. because of application need, I am not only need to know the best model, but the results for all models. When I worked with python sklearn, I can use

clf = GridSearchCV()
clf.cv_results_ 

to print out all the models, which is something as following: Grid scores on development set:

0.986 (+/-0.016) for {'C': 1, 'gamma': 0.001, 'kernel': 'rbf'}
0.959 (+/-0.029) for {'C': 1, 'gamma': 0.0001, 'kernel': 'rbf'}
0.988 (+/-0.017) for {'C': 10, 'gamma': 0.001, 'kernel': 'rbf'}
0.982 (+/-0.026) for {'C': 10, 'gamma': 0.0001, 'kernel': 'rbf'}
0.988 (+/-0.017) for {'C': 100, 'gamma': 0.001, 'kernel': 'rbf'}
0.982 (+/-0.025) for {'C': 100, 'gamma': 0.0001, 'kernel': 'rbf'}
0.988 (+/-0.017) for {'C': 1000, 'gamma': 0.001, 'kernel': 'rbf'}
0.982 (+/-0.025) for {'C': 1000, 'gamma': 0.0001, 'kernel': 'rbf'}
0.975 (+/-0.014) for {'C': 1, 'kernel': 'linear'}
0.975 (+/-0.014) for {'C': 10, 'kernel': 'linear'}
0.975 (+/-0.014) for {'C': 100, 'kernel': 'linear'}
0.975 (+/-0.014) for {'C': 1000, 'kernel': 'linear'}

In spark I have

val cv = new CrossValidator()
  .setEstimator(pipeline)
  .setEvaluator(new MulticlassClassificationEvaluator)
  .setEstimatorParamMaps(paramLRGrid)
  .setNumFolds(3)
val cvModel = cv.fit(trainingData)

I am wondering if there is a similar way as clf.cv_results_ in spark that I can see all the models

pipal
  • 113
  • 1
  • 1
  • 9

1 Answers1

0

this should help:

cvModel.subModels

as described here: Spark CrossValidatorModel access other models than the bestModel?

Matko Soric
  • 107
  • 3
  • 16