How to run GridSearchCV without cross-validation?

Question

I would like to run a regular 'gridsearch without the CV', i.e. I don't want to cross-validate, but setting cv=1 is not allowed.

I am doing this because I am using a classifier to draw decision boundaries and visualize/understand my data instead of predicting labels, and do not care about the generalization error. I would like to minimize the training error instead.

EDIT: I guess I'm really asking two questions

How to hack cv=1 in GridSearchCV? Answered by ogrisel below
Does it make sense to do a gridsearch to minimize training error instead of generalization error, and if so, how would I do that? I suspect it involves inserting my own scoring function for the scoring parameter in GridSearchCV?

What do you then need `GridSearchCV` for? If you don't need bootstrapped samples, you can just do something like `[score(y_test, Classifier(**args).fit(X_train, y_train).predict(X_test)) for args in parameters]` — Artem Sobolev, Apr 08 '15 at 15:00
Well, okay, you would need to "unroll" your `parameters` list from the scikit-learn's `GridSearchCV` format to a list of all possible combinations (like cartesian product of all lists). — Artem Sobolev, Apr 08 '15 at 15:03
ParameterGrid is public: http://scikit-learn.org/dev/modules/generated/sklearn.grid_search.ParameterGrid.html#sklearn.grid_search.ParameterGrid not that it does any magic.. — Andreas Mueller, Apr 08 '15 at 15:05
Possible duplicate of [Is there easy way to grid search without cross validation in python?](http://stackoverflow.com/questions/34624978/is-there-easy-way-to-grid-search-without-cross-validation-in-python) — jrieke, Jan 06 '17 at 23:01

score 10 · Accepted Answer · edited Nov 06 '19 at 07:41

10

You can pass an instance of ShuffleSplit(test_size=0.20, n_splits=1, random_state=0) as the cv parameter.

That will do a single CV split per parameter combination (sklearn.model_selection.ShuffleSplit).

edited Nov 06 '19 at 07:41

James Wong

4,529
4
48
65

answered Apr 09 '15 at 15:33

ogrisel

39,309
12
116
125

1

This worked (after remembering to add `n=len(X)`). Is this '1-fold CV', where we 2-fold, but strictly train on the 0.8 split and test on the 0.2 split only, instead of traditional 2-fold where both folds play the role of train and test? This appears to solve my application question of how to set `cv=1` and not the theoretical question of minimizing training error instead of generalization error. However I'm starting to find the latter not worthwhile. – selwyth Apr 14 '15 at 21:43
There is no way to use `GridSearchCV` to minimize the training error. You will have to write your own class to do so. – ogrisel Apr 14 '15 at 22:11
Where should one add the `n` ? – curio17 Nov 20 '17 at 02:27

How to run GridSearchCV without cross-validation?

1 Answers1

Linked