3

Hi I would like to create a .csv with 2 columns: the feature importance of a random forest model and the name of that feature. And to be sure that the match between numeric value and variable name is correct

Here it's an example but I cannot export to .csv correclty

test_features = test[["area","product", etc.]].values

# Create the target 
target = test["churn"].values

pred_forest = my_forest.predict(test_features)

# Print the score of the fitted random forest
print(my_forest.score(test_features, target))


importance = my_forest.feature_importances_


pd.DataFrame({"IMP": importance, "features":test_features }).to_csv('forest_0407.csv',index=False)
EdChum
  • 376,765
  • 198
  • 813
  • 562
progster
  • 877
  • 3
  • 15
  • 27
  • How does this fail? this looks a bit fishy to me as you're trying to match the feature importance against the features df itself which is incorrect as the feature importance is the columns – EdChum Jul 04 '16 at 12:42
  • I'm confused because is I print "importance" I can see only an array, but I'm not sure to which feature is matches, for that I would like to check both names and values. the message error is this: Exception: Data must be 1-dimensional – progster Jul 04 '16 at 12:49
  • try this this for features `test.columns.tolist()`. – shivsn Jul 04 '16 at 13:45
  • 1
    @shivsn the lazy typists version is [`list(df)`](http://stackoverflow.com/questions/19482970/get-list-from-pandas-dataframe-column-headers) to get the columns as a list – EdChum Jul 04 '16 at 14:42
  • @EdChum nice I didn't know that thank you. – shivsn Jul 04 '16 at 16:45
  • @EdChum list(df) or test.columns.tolist() give me only a list of variables, I would like to see in one column the name and in another column the value of importance – progster Jul 05 '16 at 06:22
  • I think what you want is something like `feat_imp = pd.Series(importance, index=df.columns)` – EdChum Jul 05 '16 at 07:57

1 Answers1

2

Use this

x = list(zip(my_forest.feature_importances_,list of features you are using))
x = pandas.DataFrame(x,columns=["Importance","Feature_Name"])
Abhishek Sharma
  • 1,909
  • 2
  • 15
  • 24