30

I am looking for a proper or best way to get variable importance in a Neural Network created with Keras. The way I currently do it is I just take the weights (not the biases) of the variables in the first layer with the assumption that more important variables will have higher weights in the first layer. Is there another/better way of doing it?

Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
user1367204
  • 4,549
  • 10
  • 49
  • 78

4 Answers4

18

Since everything will be mixed up along the network, the first layer alone can't tell you about the importance of each variable. The following layers can also increase or decrease their importance, and even make one variable affect the importance of another variable. Every single neuron in the first layer itself will give each variable a different importance too, so it's not something that straightforward.

I suggest you do model.predict(inputs) using inputs containing arrays of zeros, making only the variable you want to study be 1 in the input.

That way, you see the result for each variable alone. Even though, this will still not help you with the cases where one variable increases the importance of another variable.

Aditya
  • 615
  • 3
  • 12
  • 26
Daniel Möller
  • 84,878
  • 18
  • 192
  • 214
  • 2
    I was wondering if we could use random forest feature importances from sklearn and then use the important features evaluated by it to the keras classifier. Does this create a problem? – deadcode Jan 18 '18 at 16:05
  • 1
    @deadcode probably yes, since random forest is a decision tree algo totally different from nn's – imbr Apr 17 '18 at 18:18
  • 7
    You should really shuffle the variable among your sample instead of setting things to zero. This preserves mean and variance of inputs. – Teque5 Aug 08 '18 at 17:36
  • 1
    @Teque5 A good point of 'preserving mean and variance of inputs'. This shuffling method is called permutation importance for random forest. There is a package to do it: https://github.com/parrt/random-forest-importances. But I am not sure if permutation importance applied to NNs and deep learning models. – Huanfa Chen Jan 15 '20 at 12:15
  • 1
    Take a look at the section **Feature Importance Evaluation** of this paper http://conference.scipy.org/proceedings/scipy2019/pdfs/aero_ef_logue.pdf for an in-depth discussion of variable importance methods. – Teque5 Jan 22 '20 at 17:56
  • @Teque5 What about instead of shuffling the feature setting it values to random numbers (between 0 and 1, or similar, depending on the scaling)? Will this also be reasonable? – NeStack Feb 03 '21 at 16:36
  • @NeStack The whole point of shuffling is to keep a representative population for that particular parameter. I haven't tested randomizing the input but I expect it would be similar to setting it to 0 uniformly. Both methods would deviate significantly from the model's expected input. – Teque5 Feb 04 '21 at 00:32
  • @Teque5 , I think that randomizing may be good "as long as you use the SAME values for all samples". Then you compare the zeros, the ones, the randoms, whatever, varying only the desired variable. This might answer whether the importance of this var is more or less affected by the others. – Daniel Möller Feb 04 '21 at 14:49
  • @DanielMöller can you provide an example of that? Not too sure what you mean by randomizing if in the end you use the same values for all samples... – Zizzipupp May 14 '21 at 14:14
  • I understand that we want to drop some features if they are not useful, but say if we don't need to drop the features, would neural network handle the feature importance automatically ? Can we run the model with another algo, eg in sklearn to check the feature importance as an initial guess, then use tf nn to do the actual model ? – EBDS Dec 15 '21 at 07:41
18

*Edited to include relevant code to implement permutation importance.

I answered a similar question at Feature Importance Chart in neural network using Keras in Python. It does implement what Teque5 mentioned above, namely shuffling the variable among your sample or permutation importance using the ELI5 package.

from keras.wrappers.scikit_learn import KerasClassifier, KerasRegressor
import eli5
from eli5.sklearn import PermutationImportance

def base_model():
    model = Sequential()        
    ...
    return model

X = ...
y = ...

my_model = KerasRegressor(build_fn=basemodel, **sk_params)    
my_model.fit(X,y)

perm = PermutationImportance(my_model, random_state=1).fit(X,y)
eli5.show_weights(perm, feature_names = X.columns.tolist())
Justin Hallas
  • 601
  • 7
  • 8
  • While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - [From Review](/review/low-quality-posts/20906507) – Anil_M Sep 19 '18 at 21:41
  • Thanks for the input! I first considered copying my answer from the first link where I answered a similar question with a code example, but researched to find that was bad form. In this case my first link goes back to a similar stack overflow question. Is it ok to be link-only answer if a more complete answer is referenced directly from stack overflow? – Justin Hallas Sep 19 '18 at 22:18
  • As a norm, link only answers needs to be provided via comment. – Anil_M Sep 19 '18 at 23:05
  • 4
    I have multiple features and getting `TypeError: If no scoring is specified, the estimator passed should have a 'score' method. The estimator does not.` – StatguyUser Jan 31 '19 at 00:19
  • 1
    @Enthusiast add a score method as follows: perm = PermutationImportance( model, scoring="accuracy", random_state=1).fit( – Abhijay Ghildyal Jun 20 '19 at 21:33
8

It is not that simple. For example, in later stages the variable could be reduced to 0.

I'd have a look at LIME (Local Interpretable Model-Agnostic Explanations). The basic idea is to set some inputs to zero, pass it through the model and see if the result is similar. If yes, then that variable might not be that important. But there is more about it and if you want to know it, then you should read the paper.

See marcotcr/lime on GitHub.

Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
  • 1
    LIME permutes the covariates around a single observation and predicts output using the trained model. Then a weighted local-linear regression is fit to the model outputs and the permuted covariates (for regression anyway). This returns a coefficient for each observation. It does not attempt to explain global behaviour although that is something you can easily do by running LIME for every observation and examining the coefficients. I do not see how the above explanation is related to lime as it seems to describe a classic sensitivity analysis. – Scott Worland Oct 04 '18 at 14:07
6

This is a relatively old post with relatively old answers, so I would like to offer another suggestion of using SHAP to determine feature importance for your Keras models. SHAP also allows you to process Keras models using layers requiring 3d input like LSTM and GRU while eli5 cannot.

To avoid double-posting, I would like to offer my answer to a similar question on Stackoverflow on using SHAP.

user5305519
  • 3,008
  • 4
  • 26
  • 44