2

Say I'm using GridSearchCV to search for hyperparameters, and I'm also using a Pipeline as I (think I) want to preprocess my data:

param_grid = {
    'svc__gamma': np.linspace(0.2, 1, 5)
}

pipeline = Pipeline(steps=[('scaler', StandardScaler()), ('svc', SVC())])

search = GridSearchCV(pipeline, param_grid, cv=10)
search.fit(train_x, train_y)

Is there a way to test my assumption that the inclusion of the scaler step is actually helpful (beyond just removing it and rerunning it)?

i.e., is there a way to write:

param_grid = {
    'svc__gamma': np.linspace(0.2, 1, 5),
    'scaler': [On, Off]
}

Or is there a different way I should be approaching this?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
rwb
  • 4,309
  • 8
  • 36
  • 59

1 Answers1

2

you can do this by passing passthrough to your param_grid like so:

param_grid = {
    'svc__gamma': np.linspace(0.2, 1, 5),
    'scaler': ['passthrough', StandardScaler()]
}

as seen in the sklearn pipelines docs

Individual steps may also be replaced as parameters, and non-final steps may be ignored by setting them to 'passthrough':

>>> from sklearn.pipeline import Pipeline
>>> from sklearn.svm import SVC
>>> from sklearn.decomposition import PCA
>>> from sklearn.linear_model import LogisticRegression
>>> estimators = [('reduce_dim', PCA()), ('clf', SVC())]
>>> pipe = Pipeline(estimators)
>>> param_grid = dict(reduce_dim=['passthrough', PCA(5), PCA(10)],
...                   clf=[SVC(), LogisticRegression()],
...                   clf__C=[0.1, 10, 100])
>>> grid_search = GridSearchCV(pipe, param_grid=param_grid)
Matthew Barlowe
  • 2,229
  • 1
  • 14
  • 24