0

I would like to ask if there is a way to pull the names of the most important features and save them in pandas data frame. I know how to plot them and how to get them, but I'm looking for a way to save the most important features in a data frame.

from xgboost import XGBClassifier
from xgboost import plot_importance

# fit model to training data
xgb_model = XGBClassifier(random_state=0)
xgb_model.fit(X, y)

print("Feature Importances : ", xgb_model.feature_importances_)

# plot feature importance
fig, ax = plt.subplots(figsize=(15, 10))
plot_importance(xgb_model, max_num_features=35, height=1, ax=ax)
plt.show()
Flavia Giammarino
  • 7,987
  • 11
  • 30
  • 40
Jocelyn AL
  • 67
  • 1
  • 8
  • Does this answer your question? [How to get actual feature names in XGBoost feature importance plot without retraining the model?](https://stackoverflow.com/questions/54933804/how-to-get-actual-feature-names-in-xgboost-feature-importance-plot-without-retra) – Flavia Giammarino Aug 24 '21 at 05:37

2 Answers2

0

To show the most important features used by the model you can use and then save them into a dataframe.

xgb.plot_importance({model})
plt.rcParams['figure.figsize'] = [6, 4]
plt.show()
Isa Haji
  • 1
  • 2
0
pandas.DataFrame({'col_name': clf.feature_importances_}, index=x.columns).sort_values(by='col_name', ascending=False)

This saves your features into a dataframe. From: How are "feature_importances_" ordered in Scikit-learn's RandomForestRegressor By: Abishek Parida