How to determine feature importance of non linear kernals in SVM

Question

I am using following code for feature importance calculation.

from matplotlib import pyplot as plt
from sklearn import svm

def features_importances(coef, names):
    imp = coef
    imp,names = zip(*sorted(zip(imp,names)))
    plt.barh(range(len(names)), imp, align='center')
    plt.yticks(range(len(names)), names)
    plt.show()

features_names = ['input1', 'input2']
svm = svm.SVC(kernel='linear')
svm.fit(X, Y)
feature_importances(svm.coef_, features_names)

How would I be able to calculate featurue importance of a non linear kernal, which doesn't give expected result in the given example.

Check [this topic](http://stackoverflow.com/questions/41592661/determining-the-most-contributing-features-for-svm-classifier-in-sklearn/41601281#41601281). — Jakub Macina, Feb 15 '17 at 21:01

score 1 · Answer 1 · answered Mar 15 '18 at 20:00

Short answer: It's not possible, (at least the present libraries are not able to do it.) The feature importance of linear SVMs could be found out but not for a nonlinear SVMs, the reason being that, when the SVM is non-linear the dataset is mapped into a space of higher dimension, which is quite different from the parent dataset and the hyperplane is obtained and this high dimensional data and hence the property is changed from that of the parent dataset and hence it is not possible to find the feature importance of this SVM in relation to the parent dataset features.

score 0 · Answer 2 · edited Jun 28 '21 at 07:56

An N x N kernel result is not invertible, only traceable! Please check, if you do or can use Gradients. Those should normally trace the calculations. For the importance you need the trace after an impulse response I guess. Thus, if you input a bunch of ones.

I am not that deep into the implementation of SciKit-Learn and if it ever makes sense to attempt to get access to the traces. But at that point, you traced the response back to the features, it should give you the importance.

Nevertheless any gradient descent is not specifically made to directly trace the inputs rather than the parameters which lead to a specific output.

You have to find those back-propagated parameters of your kernel w.r.t. the response (The gradients of the kernel params given the response itself).

As, because this may be even impossible or is absolutely complex, I would refer to anything which can alternatively bring good results. Such as kernels between the different dimensions of your samples instead of between each of your individual samples. Or some response functions, which give a good dynamic scaling of you features.

On the other hand, there are several libraries which already do such things. e.g. sklearn.inspection.permutation_importance or the SHAP package — MARKUS Meister, Jun 28 '21 at 07:45

score 0 · Answer 3 · answered Nov 28 '22 at 12:32

You can't directly extract the feature importance of a SVM. But, you can use the permutation_importance from sklearn to get it.

Here is an example:

from sklearn.svm import SVC
from sklearn.inspection import permutation_importance
import numpy as np
import matplotlib.pyplot as plt


svm = SVC(kernel='poly')
svm.fit(X, Y)

perm_importance = permutation_importance(svm, X, Y)

# Making the sum of feature importance being equal to 1.0,
# so feature importance can be understood as percentage
perm_importance_normalized = perm_importance.importances_mean/perm_importance.importances_mean.sum()

# Feature's name (considering your X a DataFrame)
feature_names = X.columns
features = np.array(feature_names)

# Sort to plot in order of importance
sorted_idx = perm_importance_normalized.argsort()

# Plotting
plt.figure(figsize=(13,5))
plt.title('Feature Importance',fontsize=20)
plt.barh(features[sorted_idx], perm_importance_normalized[sorted_idx], color='b', align='center')
plt.xlabel('Relative Importance', fontsize=15)
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)

for index, value in enumerate(perm_importance_normalized[sorted_idx]):
    plt.text(value, index,
             str(round(value,2)), fontsize=15)

plt.show()

How to determine feature importance of non linear kernals in SVM

3 Answers3