Feature_importances in scikit learn, how choose correct parameters?

Question

My task is to understand which features (situated in columns of X dataset) are the best in predicting target variable - y. I've decided to use feature_importances_ in RandomForestClassifier. RandomForestClassifier have best score (aucroc), when max_depth=10 and n_estimators = 50. Is it correct to use feature_importances_ with best parameters, or default parameters? Why? How does feature_importances_ work?

There are to models with best and default parameters for example.

1)

model = RandomForestClassifier(max_depth=10,n_estimators = 50)
model.fit(X, y)
feature_imp = pd.DataFrame(model.feature_importances_, index=X.columns, columns=["importance"])

2)

model = RandomForestClassifier()
model.fit(X, y)
feature_imp = pd.DataFrame(model.feature_importances_, index=X.columns, columns=["importance"])

you don't use feature importances. It's an estimation how informative each feature is for your predictions. — cel, Aug 29 '16 at 11:14
As @cel said, `feature_importances_` will just score the importance of each of your columns. That's all. Additionally, if you just google scikits-learn documentation you will find [here](http://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html) a demo of how can you *read* `feature_importances_`. — Imanol Luengo, Aug 31 '16 at 13:24

score -1 · Answer 1 · edited May 23 '17 at 12:22

-1

I think you should use feature_importances_ with the best parameters, it is the model that you are going to use. There is nothing special about default parameter that deserves special treatment. As for how does feature_importances_ work, you can reference the answer of scikit-learn authors here How are feature_importances in RandomForestClassifier determined?

edited May 23 '17 at 12:22

Community

1
1

answered Aug 29 '16 at 13:46

user108372

171
1
3
9

Feature_importances in scikit learn, how choose correct parameters?

1 Answers1