New to ML here and trying my hands on fitting a model using Random Forest. Here is my simplified code:
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.15, random_state=42)
model = RandomForestRegressor()
param_grid = {
'n_estimators': [100, 200, 500],#, 300],
'max_depth': [3, 5, 7],
'max_features': [3, 5, 7],
'random_state': [42]
}
Next, I perform grid search for the best parameters:
grid_search = GridSearchCV(model, param_grid, cv=5)
grid_search.fit(X_train, y_train)
print(grid_search.best_params_)
This yields the output:
{'max_depth': 7, 'max_features': 3, 'n_estimators': 500, 'random_state': 42}
Next, I implement prediction for the model. I get the output R2= 0.998 for test and train data:
y_train_pred = best_model.predict(X_train)
y_test_pred = best_model.predict(X_test)
train_r2 = r2_score(y_train, y_train_pred)
test_r2 = r2_score(y_test, y_test_pred)
Question:
The above code did ascertain the 'max features'
to be 3.
- I suppose those 3 features were used to predict the model and then calculate R2. Is that right?
- If #1 is correct then how do I print the 3 features which were used for the best prediction and obtain a R2 of 0.998?