12

I would like to run a regular 'gridsearch without the CV', i.e. I don't want to cross-validate, but setting cv=1 is not allowed.

I am doing this because I am using a classifier to draw decision boundaries and visualize/understand my data instead of predicting labels, and do not care about the generalization error. I would like to minimize the training error instead.

EDIT: I guess I'm really asking two questions

  1. How to hack cv=1 in GridSearchCV? Answered by ogrisel below
  2. Does it make sense to do a gridsearch to minimize training error instead of generalization error, and if so, how would I do that? I suspect it involves inserting my own scoring function for the scoring parameter in GridSearchCV?
selwyth
  • 2,417
  • 16
  • 19
  • What do you then need `GridSearchCV` for? If you don't need bootstrapped samples, you can just do something like `[score(y_test, Classifier(**args).fit(X_train, y_train).predict(X_test)) for args in parameters]` – Artem Sobolev Apr 08 '15 at 15:00
  • Well, okay, you would need to "unroll" your `parameters` list from the scikit-learn's `GridSearchCV` format to a list of all possible combinations (like cartesian product of all lists). – Artem Sobolev Apr 08 '15 at 15:03
  • ParameterGrid is public: http://scikit-learn.org/dev/modules/generated/sklearn.grid_search.ParameterGrid.html#sklearn.grid_search.ParameterGrid not that it does any magic.. – Andreas Mueller Apr 08 '15 at 15:05
  • Possible duplicate of [Is there easy way to grid search without cross validation in python?](http://stackoverflow.com/questions/34624978/is-there-easy-way-to-grid-search-without-cross-validation-in-python) – jrieke Jan 06 '17 at 23:01

1 Answers1

10

You can pass an instance of ShuffleSplit(test_size=0.20, n_splits=1, random_state=0) as the cv parameter.

That will do a single CV split per parameter combination (sklearn.model_selection.ShuffleSplit).

James Wong
  • 4,529
  • 4
  • 48
  • 65
ogrisel
  • 39,309
  • 12
  • 116
  • 125
  • 1
    This worked (after remembering to add `n=len(X)`). Is this '1-fold CV', where we 2-fold, but strictly train on the 0.8 split and test on the 0.2 split only, instead of traditional 2-fold where both folds play the role of train and test? This appears to solve my application question of how to set `cv=1` and not the theoretical question of minimizing training error instead of generalization error. However I'm starting to find the latter not worthwhile. – selwyth Apr 14 '15 at 21:43
  • There is no way to use `GridSearchCV` to minimize the training error. You will have to write your own class to do so. – ogrisel Apr 14 '15 at 22:11
  • Where should one add the `n` ? – curio17 Nov 20 '17 at 02:27