58

I'm running GridSearch CV to optimize the parameters of a classifier in scikit. Once I'm done, I'd like to know which parameters were chosen as the best.

Whenever I do so I get a AttributeError: 'RandomForestClassifier' object has no attribute 'best_estimator_', and can't tell why, as it seems to be a legitimate attribute on the documentation.

from sklearn.grid_search import GridSearchCV

X = data[usable_columns]
y = data[target]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

rfc = RandomForestClassifier(n_jobs=-1,max_features= 'sqrt' ,n_estimators=50, oob_score = True) 

param_grid = {
    'n_estimators': [200, 700],
    'max_features': ['auto', 'sqrt', 'log2']
}

CV_rfc = GridSearchCV(estimator=rfc, param_grid=param_grid, cv= 5)

print '\n',CV_rfc.best_estimator_

Yields:

`AttributeError: 'GridSearchCV' object has no attribute 'best_estimator_'
sapo_cosmico
  • 6,274
  • 12
  • 45
  • 58
  • For your information, max_features 'auto' and 'sqrt' are the same. They both compute max_features=sqrt(n_features). – Marine Aug 05 '20 at 12:46

2 Answers2

95

You have to fit your data before you can get the best parameter combination.

from sklearn.grid_search import GridSearchCV
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
# Build a classification task using 3 informative features
X, y = make_classification(n_samples=1000,
                           n_features=10,
                           n_informative=3,
                           n_redundant=0,
                           n_repeated=0,
                           n_classes=2,
                           random_state=0,
                           shuffle=False)


rfc = RandomForestClassifier(n_jobs=-1,max_features= 'sqrt' ,n_estimators=50, oob_score = True) 

param_grid = { 
    'n_estimators': [200, 700],
    'max_features': ['auto', 'sqrt', 'log2']
}

CV_rfc = GridSearchCV(estimator=rfc, param_grid=param_grid, cv= 5)
CV_rfc.fit(X, y)
print CV_rfc.best_params_
Ryan
  • 3,555
  • 1
  • 22
  • 36
  • It worked indeed, thank you! Any idea as to why? (I thought gridSearch would find the parameters, but I couldn't even get the parameters back before fitting) – sapo_cosmico May 07 '15 at 15:28
  • 16
    Different data sets will have different optimized parameter combinations, i.e. without data, there is no optimal parameter combination – Ryan May 07 '15 at 15:32
  • 1
    What is the sense to pass n_estimators to RandomForestClassifier taking into account that you also pass it to GridSearchCV in param_grid? – sergzach Jan 31 '18 at 09:55
  • I didn't notice that, good point. I just copy/pasted the original code when I saw that there was no call to the fit method – Ryan Jan 31 '18 at 15:25
  • By the way, `'max_features'` `'auto'` is the same exact `'sqrt'` ... it's redundant to pass both !! – Yahya Sep 21 '18 at 10:53
  • 1
    In the answer, the `fit()` method is called on `X` and `y`, so the `train_test_split` in the question isn't used. Should the split be dropped altogether when using `GridSearchCV`? – Jack Fleeting Feb 13 '19 at 18:09
  • @JackFleeting probably best to generate a cross-validation generator and pass that to `GridSearchCV`'s `cv` parameter. At the time of writing, by default `cv` is set to 3-fold CV (this will change to 5-fold CV in v0.22) – Ryan Feb 27 '19 at 17:37
  • 1
    The `GridSearchCV` sub-library was imported wrongly – Ṃųỻịgǻňạcểơửṩ Jan 25 '20 at 23:36
12

Just to add one more point to keep it clear.

The document says the following:

best_estimator_ : estimator or dict:

Estimator that was chosen by the search, i.e. estimator which gave highest score (or smallest loss if specified) on the left out data.

When the grid search is called with various params, it chooses the one with the highest score based on the given scorer func. Best estimator gives the info of the params that resulted in the highest score.

Therefore, this can only be called after fitting the data.

Community
  • 1
  • 1
rohithnama
  • 239
  • 2
  • 5