0

In Scikit-learn RandomSearchCV and GridSearchCV require the cross validation object for the cv argument, e.g. GroupKFold or any other CV splitter from sklearn.model_selection.

However, how can I use single, static validation set? I have very large training set, large validation set and I only need the interface of CV objects, not whole cross validation.

Specifically, I'm using Scikit-optimize and BayesSearchCV (docs) and it requires the CV object (same interface as regular Scikit-learn SearchCV objects). I want to use my chosen validation set with it, not whole CV.

qalis
  • 1,314
  • 1
  • 16
  • 44

1 Answers1

2

The docs of the model selection objects of scikit-learn, e.g. GridSearchCV, are maybe a bit clearer how to achieve this:

cv: int, cross-validation generator or an iterable, default=None

  • ...
  • An iterable yielding (train, test) splits as arrays of indices.

So you need the arrays of indices for training and test samples as a tuple and then wrap them in an iterable, e.g. a list:

train_indices = [...]  # indices for training
test_indices = [...]  # indices for testing

cv = [(train_indices, test_indices)]

Pass this cv defined with a single tuple to the model selection object and it will always use the same samples for training and testing.

afsharov
  • 4,774
  • 2
  • 10
  • 27