7

I have a few questions concerning Randomized grid search in a Random Forest Regression Model. My parameter grid looks like this:

random_grid = {'bootstrap': [True, False],
               'max_depth': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, None],
               'max_features': ['auto', 'sqrt'],
               'min_samples_leaf': [1, 2, 4],
               'min_samples_split': [2, 5, 10],
               'n_estimators': [130, 180, 230]}

and my code for the RandomizedSearchCV like this:

# Use the random grid to search for best hyperparameters
# First create the base model to tune
from sklearn.ensemble import RandomForestRegressor
rf = RandomForestRegressor()
# Random search of parameters, using 3 fold cross validation, 
# search across 100 different combinations, and use all available cores
rf_random = RandomizedSearchCV(estimator = rf, param_distributions = random_grid, n_iter = 100, cv = 3, verbose=2, random_state=42, n_jobs = -1)
# Fit the random search model
rf_random.fit(X_1, Y)

is there any way to calculate the Root mean square at each parameter set? This would be more interesting to me as the R^2 score? If I now want to get the best parameter set, as printed underneath i would also use the lowest RMSE score. Is there any way to do that?

rf_random.best_params_
rf_random.best_score_
rf_random.best_estimator_

thank you, R

raffa_sa
  • 415
  • 2
  • 4
  • 13

3 Answers3

7

Add the 'scoring'-parameter to RandomizedSearchCV.

RandomizedSearchCV(scoring="neg_mean_squared_error", ...

Alternative options can be found in the docs

With this, you can print the RMSE for each parameter set, along with the parameter set:

cv_results = rf_random.cv_results_
for mean_score, params in zip(cv_results["mean_test_score"], cvres["params"]):
    print(np.sqrt(-mean_score), params)
Tobi
  • 414
  • 3
  • 9
  • so the RandomizedSearchCV should now internally work with the RMSE right? Then i don't understand my result. I get for `rf_random.best_score_` this result `-13684.3`. RMSE can't be negative normally? @Tobi – raffa_sa Dec 17 '18 at 10:26
  • You are almost correct. It is working with the MSE (without the Square). However, for Grid/Randomized/...SearchCV it has to be the negative MSE. And that is why I used np.sqrt( - mean_score). An explanation for the negation is given here: https://stackoverflow.com/questions/21050110/sklearn-gridsearchcv-with-pipeline. – Tobi Dec 17 '18 at 15:55
0

If you want to create a dataframe for the results of each cv, use the following. Set return_train_score as True if you need the results for training dataset as well.

rf_random = RandomizedSearchCV(estimator = rf, return_train_score = True)
import pandas as pd
df = pd.DataFrame(rf_random.cv_results_)
Venkatachalam
  • 16,288
  • 9
  • 49
  • 77
0
Maybe this will help you.

`
rf_grid = {"n_estimators": np.arange(10, 100, 10),<br>
           "max_depth": [None, 3, 5, 10],<br>
           "min_samples_split": np.arange(2, 20, 2),<br>
           "min_samples_leaf": np.arange(1, 20, 2),<br>
           "max_features": [0.5, 1, "sqrt", "auto"],
           "max_samples": [10000]}

# Instantiate RandomizedSearchCV model
rs_model = RandomizedSearchCV(RandomForestRegressor(n_jobs=-1, random_state=42),
                              param_distributions=rf_grid,
                              n_iter=2,
                              cv=5,
                              verbose=True)
# fit
rs_model.fit(X_train, y_train)`
grey
  • 209
  • 3
  • 6