0

My question is similar to the one here, but the answer there does not explain why we must fit to the data before getting the best paramters, it just states that we must. In my understanding, GridSeachCV picks the best model parameters using cross validation, and then returns this best model in the .best_estimator attribute. Then we can fit the that model to our data. But shouldn't we be able to access the parameters it picked and the .best_estimator model before fitting to the data?

As an example, the code below works fine:

logreg = LogisticRegression(solver='liblinear')  
params = {'penalty':['l1','l2'], 'C':np.logspace(-3,3,7),}
grid = GridSearchCV(estimator=logreg, params, cv=4)                                                                                                                                                   
grid.fit(X_train,y_train)  
best_model_params = grid.best_params_
y_pred = grid.predict(X_test)

But the following code does not work:

logreg = LogisticRegression(solver='liblinear')  
params = {'penalty':['l1','l2'], 'C':np.logspace(-3,3,7),}
grid = GridSearchCV(estimator=logreg, params, cv=4)                                                                                                                                                   
best_model_params = grid.best_params_
grid.fit(X_train,y_train)  
y_pred = grid.predict(X_test)

It gives AttributeError: 'GridSearchCV' object has no attribute 'best_estimator_'.

On a related note, if grid.best_estimator_ is the best LogisticRegression model (the model with the set of hyperparameters found to be best through cross validation in GridSearchCV) then why do we fit the grid object instead of the grid.best_estimator_ object? E.g. if I could figure out how to access the best_estimator attribute before fitting, would the following code work?:

logreg = LogisticRegression(solver='liblinear')  
params = {'penalty':['l1','l2'], 'C':np.logspace(-3,3,7),}
grid = GridSearchCV(estimator=logreg, params, cv=4)                                                                                                                                                   
best_model = <somehow get the model GridSearchCV has picked>
best_model.fit(X_train,y_train)  
y_pred = best_model.predict(X_test)
  • 5
    How can you expect the grid search to know which parameters are the best before it has been allowed to try any? – hbgoddard Mar 24 '22 at 23:07
  • Oooooh... so GridSearchCV is just initializing the object and then its not actually doing the cross validation untill fit is called? – user1093541 Mar 25 '22 at 15:42
  • Correct. It is a general rule for scikit-learn objects that nothing important happens until fit() is called. – hbgoddard Mar 25 '22 at 20:03

0 Answers0