0

I used Logistic Regression as a classifier. I have six features, I want to know the important features in this classifier that influence the result more than other features. I used Information Gain but it seems that it doesn't depend on the used classifier. Is there any method to rank the features according to their importance based on specific classifier (like Logistic Regression)? any help would be highly appreciated.

BlueGirl
  • 491
  • 2
  • 9
  • 29
  • You can look at the specific class of feature selection methods namely "Wrapper" and "Embedded" methods which take into account the effect of the model along with data. One example would be "Feature Saliency" http://www.sciencedirect.com/science/article/pii/S089812219700059X – ayandas Feb 07 '16 at 16:49
  • Maybe [this question](http://stackoverflow.com/questions/34052115/how-to-find-the-importance-of-the-features-for-a-logistic-regression-model?lq=1) could help? Though the coefficients are only really useful if all features are normalized (zero mean, all features have the same standard deviation). I'll also point to [this question](http://stackoverflow.com/questions/34529513/how-can-i-get-the-relative-importance-of-features-of-a-logistic-regression-for-a/34723446) in case you want to know the feature importance for a particular sample/prediction. – Robin Spiess Feb 08 '16 at 14:44
  • Voting to migrate to stats.stackexchange.com - I think you'll get more answers there. – Matt Parker Feb 12 '16 at 20:38
  • Hope this will be helpful for anybody still looking for answers. http://scikit-learn.org/stable/modules/feature_selection.html – prashanth Feb 01 '17 at 12:32

2 Answers2

0

You could use Random Forest Classifier to give you a ranking of your features. You could then select the top x features from this and use it for logistic regression, although Random Forest would work perfectly fine as well.

Check out variable importance at https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm

0

One way to do this is by null hypothesis significance testing. Basically, for each feature, you test for evidence that the coefficient of that feature is nonzero. Most statistical software reports the results of these tests by default in the model summary (Scikit-learn and other machine-learning oriented tools tend to not do so). With a small number of features, you can use this information and stepwise regression to rank the importance of the features.