43

I am implementing an example from the O'Reilly book "Introduction to Machine Learning with Python", using Python 2.7 and sklearn 0.16.

The code I am using:

pipe = make_pipeline(TfidfVectorizer(), LogisticRegression())
param_grid = {"logisticregression_C": [0.001, 0.01, 0.1, 1, 10, 100], "tfidfvectorizer_ngram_range": [(1,1), (1,2), (1,3)]}
grid = GridSearchCV(pipe, param_grid, cv=5)
grid.fit(X_train, y_train)
print("Best cross-validation score: {:.2f}".format(grid.best_score_))

The error being returned boils down to:

ValueError: Invalid parameter logisticregression_C for estimator Pipeline

Is this an error related to using Make_pipeline from v.0.16? What is causing this error?

Jonas
  • 121,568
  • 97
  • 310
  • 388
sudo_coffee
  • 888
  • 1
  • 12
  • 26

4 Answers4

71

There should be two underscores between estimator name and it's parameters in a Pipeline logisticregression__C. Do the same for tfidfvectorizer

It is mentioned in the user guide here: https://scikit-learn.org/stable/modules/compose.html#nested-parameters.

See the example at https://scikit-learn.org/stable/auto_examples/compose/plot_compare_reduction.html#sphx-glr-auto-examples-compose-plot-compare-reduction-py

Vivek Kumar
  • 35,217
  • 8
  • 109
  • 132
24

For a more general answer to using Pipeline in a GridSearchCV, the parameter grid for the model should start with whatever name you gave when defining the pipeline. For example:

# Pay attention to the name of the second step, i. e. 'model'
pipeline = Pipeline(steps=[
     ('preprocess', preprocess),
     ('model', Lasso())
])

# Define the parameter grid to be used in GridSearch
param_grid = {'model__alpha': np.arange(0, 1, 0.05)}

search = GridSearchCV(pipeline, param_grid)
search.fit(X_train, y_train)

In the pipeline, we used the name model for the estimator step. So, in the grid search, any hyperparameter for Lasso regression should be given with the prefix model__. The parameters in the grid depends on what name you gave in the pipeline. In plain-old GridSearchCV without a pipeline, the grid would be given like this:

param_grid = {'alpha': np.arange(0, 1, 0.05)}
search = GridSearchCV(Lasso(), param_grid)

You can find out more about GridSearch from this post.

Bex T.
  • 1,062
  • 1
  • 12
  • 28
8

Note that if you are using a pipeline with a voting classifier and a column selector, you will need multiple layers of names:

pipe1 = make_pipeline(ColumnSelector(cols=(0, 1)),
                      LogisticRegression())
pipe2 = make_pipeline(ColumnSelector(cols=(1, 2, 3)),
                      SVC())
votingClassifier = VotingClassifier(estimators=[
        ('p1', pipe1), ('p2', pipe2)])

You will need a param grid that looks like the following:

param_grid = { 
        'p2__svc__kernel': ['rbf', 'poly'],
        'p2__svc__gamma': ['scale', 'auto'],
    }

p2 is the name of the pipe and svc is the default name of the classifier you create in that pipe. The third element is the parameter you want to modify.

Eric Wiener
  • 4,929
  • 4
  • 31
  • 40
0

You can always use the model.get_params().keys() [ in case you are using only model ] or pipeline.get_params().keys() [ in case you are using the pipeline] to get the keys to the parameters you can adjust.

  • This is the only solution helped me to solve same problem. In my case, I had to replace `max_depth` with `selectfrommodel__estimator__max_depth` found in `pipeline.get_params().keys()` – user164863 Jan 09 '23 at 19:05