How to interpret and view the complete permutation feature plot in jupyter?

Question

I am trying to generate the feature importance plot through Permutation Feature Importance plot. I am trying to kind of make sure whether the features returned through different approaches is stable. To select optimal features. Can we get a p-value or something of that sort which can indicate the feature is significant? If I could do it with PFI, i could be more confident but looks like the results are entirely opposite

Here is my code to generate the plot

logreg=LogisticRegression(random_state=1) # i also tried with Random Forest
logreg.fit(X_train_std,y_train)
perm = PermutationImportance(logreg,random_state=1).fit(X_train_std,y_train)
eli5.show_weights(perm)   #find the issue with plot below

Questions

1) The feature that I see at the top was non-significant in other approaches (Chi-square,Xgboost Feature importance, Logistic Regression stats model summary etc) but here i see it at the top which I am a bit shocked. Is it ordered in a decreasing order or ascending order?

2) I understand PFI randomizes value to see the reduction in model error. If first row (X18) is an important feature, then it's totally opposite of my other approaches. Am I making any mistake here? What should I be looking/checking in a situation like this? Or should I apply PFI only on already selected important features?

3) How do I make the jupyter cell to display to all rows. Currently it doesn't show remaining 35 rows as shown below . I have already set pandas_set column width, rows etc

Can you help me with this?

be careful about this issue: https://stackoverflow.com/q/60489934/5025009 — seralouk, Mar 02 '20 at 14:21

score 1 · Answer 1 · edited Dec 19 '19 at 14:00

Use the attribute top= to solve Questions 3, as in eli5.show_weights(perm,top=100). More in the docs.

For question 1 and 2, I've been in a similar situation. As far as I know, different approaches do have different outputs. Each approach has its own criteria. For TREE approaches, such as DecisionTree, xgboost, catboost, GBRT, etc., in the process of building a tree. The more a feature is used, the more important it becomes. But other approaches don't.

How to interpret and view the complete permutation feature plot in jupyter?

1 Answers1