6

I was reading about fine tuning the model using GridSearchCV and I came across a Parameter Grid Shown below :

param_grid = [
{'n_estimators': [3, 10, 30], 'max_features': [2, 4, 6, 8]},

{'bootstrap': [False], 'n_estimators': [3, 10], 'max_features': [2, 3, 4]},
]
forest_reg = RandomForestRegressor(random_state=42)
# train across 5 folds, that's a total of (12+6)*5=90 rounds of training 
grid_search = GridSearchCV(forest_reg, param_grid, cv=5,
                       scoring='neg_mean_squared_error')
grid_search.fit(housing_prepared, housing_labels)

Here I am not getting the concept of n_estimator and max_feature. Is it like n_estimator means number of records from data and max_features means number of attributes to be selected from data?

After Going further I got this result :

>> grid_search.best_params_
{'max_feature':8, 'n_estimator':30}

So the thing is I am not getting what Actually this result want to say..

sascha
  • 32,238
  • 6
  • 68
  • 110
Viral Parmar
  • 1,155
  • 2
  • 8
  • 8
  • 2
    Please read the docs: [RandomForestRegressor](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html) and the [user guide](http://scikit-learn.org/stable/modules/ensemble.html#forests-of-randomized-trees) – Vivek Kumar Sep 15 '17 at 10:43

2 Answers2

20

After reading the documentation for RandomForest Regressor you can see that n_estimators is the number of trees to be used in the forest. Since Random Forest is an ensemble method comprising of creating multiple decision trees, this parameter is used to control the number of trees to be used in the process.

max_features on the other hand, determines the maximum number of features to consider while looking for a split. For more information on max_features read this answer.

Gambit1614
  • 8,547
  • 1
  • 25
  • 51
  • So Who will decide how many feature will be considered for a good split ? And what feature we are talking about ? Is it the Attributes of a data to be considered as feature or number of data to be considered as feature ? – Viral Parmar Sep 15 '17 at 08:30
  • @Virtsu Since you are using GridSearchCV, this function decide the best value for `max_features` depending on how well the classifier performs on the dataset. – Gambit1614 Sep 15 '17 at 08:31
1

n_estimators: This is the number of trees (in general the number of samples on which this algorithm will work then it will aggregate them to give you the final answer) you want to build before taking the maximum voting or averages of predictions. The higher number of trees give you better performance but makes your code slower.

max_features: The number of features to consider when looking for the best split.

>> grid_search.best_params_ :- {'max_feature':8, 'n_estimator':30}

This means they are best hyperparameter you should run model among n_estimators{3,10,30} or max_features {2, 4, 6, 8}

halfer
  • 19,824
  • 17
  • 99
  • 186