47

I am using python(3.6) anaconda (64 bit) spyder (3.1.2). I already set a neural network model using keras (2.0.6) for a regression problem(one response, 10 variables). I was wondering how can I generate feature importance chart like so:

feature importance chart

def base_model():
    model = Sequential()
    model.add(Dense(200, input_dim=10, kernel_initializer='normal', activation='relu'))
    model.add(Dense(1, kernel_initializer='normal'))
    model.compile(loss='mean_squared_error', optimizer = 'adam')
    return model

clf = KerasRegressor(build_fn=base_model, epochs=100, batch_size=5,verbose=0)
clf.fit(X_train,Y_train)
CDspace
  • 2,639
  • 18
  • 30
  • 36
andre
  • 551
  • 1
  • 5
  • 8

4 Answers4

37

I was recently looking for the answer to this question and found something that was useful for what I was doing and thought it would be helpful to share. I ended up using a permutation importance module from the eli5 package. It most easily works with a scikit-learn model. Luckily, Keras provides a wrapper for sequential models. As shown in the code below, using it is very straightforward.

from keras.wrappers.scikit_learn import KerasClassifier, KerasRegressor
import eli5
from eli5.sklearn import PermutationImportance

def base_model():
    model = Sequential()        
    ...
    return model

X = ...
y = ...

my_model = KerasRegressor(build_fn=base_model, **sk_params)    
my_model.fit(X,y)

perm = PermutationImportance(my_model, random_state=1).fit(X,y)
eli5.show_weights(perm, feature_names = X.columns.tolist())
Akavall
  • 82,592
  • 51
  • 207
  • 251
Justin Hallas
  • 601
  • 7
  • 8
  • this line *eli5.show_weights(perm, feature_names = X.columns.tolist())* returns error: *AttributeError: module 'eli5' has no attribute 'show_weights'* – S34N Nov 14 '18 at 08:38
  • Traceback (most recent call last): File in eli5.show_weights(perm, feature_names = col) AttributeError: module 'eli5' has no attribute 'show_weights' – S34N Nov 14 '18 at 08:55
  • Not sure what the issue is. It works on my computer and is listed in documentation here: https://eli5.readthedocs.io/en/latest/overview.html Do you have the most recent version? – Justin Hallas Nov 15 '18 at 17:56
  • I had a chat with the eli5 developer; It turns out that the error: AttributeError: module 'eli5' has no attribute 'show_weights' is only displayed if I'm not using iPython Notebook, which I wasn't at the time of when the post was published. Strange phenomenon, but I will test it out with IPython installed. – S34N Nov 16 '18 at 11:41
  • eli5.show_weights outputs an HTML object, so it will only be displayed in iPython (jupyter) Notebook. – gradLife Aug 20 '19 at 20:34
  • 2
    why the sum of all the permutations (perm.feature_importances_) are not equal to one? – Henry Navarro Apr 02 '20 at 14:10
  • I would like to add that `eli5` currently only supports 2d arrays. If your model uses 3d layers like `GRU` or `LSTM`, `eli5` will not work for you. You need to use another library like `SHAP` instead. – user5305519 May 18 '20 at 03:21
18

This is a relatively old post with relatively old answers, so I would like to offer another suggestion of using SHAP to determine feature importance for your Keras models. SHAP offers support for both 2d and 3d arrays compared to eli5 which currently only supports 2d arrays (so if your model uses layers which require 3d input like LSTM or GRU, eli5 will not work).

Here is the link to an example of how SHAP can plot the feature importance for your Keras models, but in case it ever becomes broken some sample code and plots are provided below as well (taken from said link):


import shap

# load your data here, e.g. X and y
# create and fit your model here

# load JS visualization code to notebook
shap.initjs()

# explain the model's predictions using SHAP
# (same syntax works for LightGBM, CatBoost, scikit-learn and spark models)
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X)

# visualize the first prediction's explanation (use matplotlib=True to avoid Javascript)
shap.force_plot(explainer.expected_value, shap_values[0,:], X.iloc[0,:])

shap.summary_plot(shap_values, X, plot_type="bar")

enter image description here

user5305519
  • 3,008
  • 4
  • 26
  • 44
  • 4
    Error when using `DeepExplainer`: `keras is no longer supported, please use tf.keras instead.` – Kermit Jun 27 '20 at 19:18
  • 8
    Error when using `TreeExplainer` `SHAPError: Model type not yet supported by TreeExplainer: ` – Kermit Jun 27 '20 at 19:18
  • @HashRocketSyntax I assume you are trying to use `Sequential` layer from Keras. Can you try importing `Sequential` using this instead? `from tensorflow.keras import Sequential` – user5305519 Jun 28 '20 at 03:46
  • 3
    @jarrettyeo, `from tensorflow.keras import Sequential` still doesn't work. I get the error: `Exception: Model type not yet supported by TreeExplainer: ` – Mitch Oct 21 '20 at 21:30
  • @user5305519 can you provide the solution to any of the above questions? I am also getting this error: Exception: Model type not yet supported by TreeExplainer: – 傅能杰 Dec 13 '21 at 16:26
  • @Kermit refer to https://stackoverflow.com/a/72480697/13046931 – seth Dec 04 '22 at 23:27
7

At the moment Keras doesn't provide any functionality to extract the feature importance.

You can check this previous question: Keras: Any way to get variable importance?

or the related GoogleGroup: Feature importance

Spoiler: In the GoogleGroup someone announced an open source project to solve this issue..

paolof89
  • 1,319
  • 5
  • 17
  • 31
1

A lame way is to get weights for each neuron in each layer and show/stack them together.

feature_df = pd.DataFrame(columns=['feature','layer','neuron','weight','abs_weight'])

for i,layer in enumerate(model.layers[:-1]): 
    w = layer.get_weights()
    w = np.array(w[0])
    n = 0
    for neuron in w.T:
        for f,name in zip(neuron,X.columns):
            feature_df.loc[len(feature_df)] = [name,i,n,f,abs(f)]
        
        n+=1
        
feature_df = feature_df.sort_values(by=['abs_weight'])
feature_df.reset_index(inplace=True)
feature_df = feature_df.drop(['index'], axis=1)

fig = px.bar(feature_df,x='feature',y='abs_weight',template='simple_white')
fig.show()

It gives something like this, x-axis is your features:

enter image description here

Noora
  • 31
  • 3