3

In the scikit-learn documentation example http://scikit-learn.org/stable/auto_examples/model_selection/grid_search_digits.html a train_test_split is done before the grid search.

the grid search is then fit using the training sets and tested on the testing set from the train_test_split.

I wanted to know if it's possible and advisable to do a kfold cross validation inplace of the train_test_split so I could fit and test grid search on different data folds instead of just one train_test_split.(and consequently get the best score and parameters in this way)

Ricky
  • 305
  • 3
  • 14
  • That is called nested grid search with cross validation. You can look at [official documentation example](http://scikit-learn.org/stable/auto_examples/model_selection/plot_nested_cross_validation_iris.html) and [my answer here](http://stackoverflow.com/a/42230764/3374996) for understanding it more. – Vivek Kumar Mar 04 '17 at 03:06
  • So after we create the gridsearchCV object, is it fitting inside the cross_val_score function with the X_train and y_train found in every kfold iteration rather than the entire X and y? hopefully yes because that makes sense to me. – Ricky Mar 04 '17 at 06:00
  • also would I be able to call best_estimator_ in this case. After doing what You have done? ..........clf = GridSearchCV(estimator=svr, param_grid=c_grid, cv=inner_cv) nested_score = cross_val_score(clf, X=X_iris, y=y_iris, cv=outer_cv).mean() – Ricky Mar 04 '17 at 06:06
  • The step by step explanation in that answer is only for last two lines. – Vivek Kumar Mar 06 '17 at 05:26

0 Answers0