-2

My task is to understand which features (situated in columns of X dataset) are the best in predicting target variable - y. I've decided to use feature_importances_ in RandomForestClassifier. RandomForestClassifier have best score (aucroc), when max_depth=10 and n_estimators = 50. Is it correct to use feature_importances_ with best parameters, or default parameters? Why? How does feature_importances_ work?

There are to models with best and default parameters for example.

1)

model = RandomForestClassifier(max_depth=10,n_estimators = 50)
model.fit(X, y)
feature_imp = pd.DataFrame(model.feature_importances_, index=X.columns, columns=["importance"])

2)

model = RandomForestClassifier()
model.fit(X, y)
feature_imp = pd.DataFrame(model.feature_importances_, index=X.columns, columns=["importance"])
ArK
  • 20,698
  • 67
  • 109
  • 136
  • 1
    you don't use feature importances. It's an estimation how informative each feature is for your predictions. – cel Aug 29 '16 at 11:14
  • As @cel said, `feature_importances_` will just score the importance of each of your columns. That's all. Additionally, if you just google scikits-learn documentation you will find [here](http://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html) a demo of how can you *read* `feature_importances_`. – Imanol Luengo Aug 31 '16 at 13:24

1 Answers1

-1

I think you should use feature_importances_ with the best parameters, it is the model that you are going to use. There is nothing special about default parameter that deserves special treatment. As for how does feature_importances_ work, you can reference the answer of scikit-learn authors here How are feature_importances in RandomForestClassifier determined?

Community
  • 1
  • 1
user108372
  • 171
  • 1
  • 3
  • 9