0

I found this question here which seems to address my problem(Determining the most contributing features for SVM classifier in sklearn). However as my understanding of Python language is limited I need some help.

I have a dependent variable which is 'Group' that has two levels 'Group1' and 'Group2'.

This is the code I found, adapted to my data:

import pandas as pd
df = pd.read_csv('C:/Users/myPC/OneDrive/Desktop/analysis/dataframe6.csv')

X = df.drop('Group', axis=1)
y = df['Group']

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20)

from sklearn.svm import SVC
svclassifier = SVC(kernel='linear')
svclassifier.fit(X_train, y_train)

y_pred = svclassifier.predict(X_test)

from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))

from matplotlib import pyplot as plt
from sklearn import svm

def f_importances(coef, names):
    imp = coef
    imp,names = zip(*sorted(zip(imp,names)))
    plt.barh(range(len(names)), imp, align='center')
    plt.yticks(range(len(names)), names)
    plt.show()

features_names = ['input1', 'input2']
svclassifier = SVC(kernel='linear')
svclassifier.fit(X_train, y_train)

f_importances(svclassifier.coef_, features_names)

It produces just a blank plot. I think there is something I should change in features_names = ['input1', 'input2'] but I am not sure what.

Ed9012
  • 3
  • 2

1 Answers1

0

The code you used to plot expects a one-dimensional array. The attribute coef_, according to the documentation will be:

coef_ ndarray of shape (n_classes * (n_classes - 1) / 2, n_features) Weights assigned to the features when kernel="linear".

Using an example :

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC

np.random.seed(123)

df = pd.DataFrame(np.random.uniform(0,1,(400,3)),columns=['input1','input2','input3'])
df['Group'] = np.random.choice(['Group1','Group2'],400)
X = df.drop('Group', axis=1)
y = df['Group']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20)

svclassifier = SVC(kernel='linear')
svclassifier.fit(X_train, y_train)

We check the shape of the array:

print(svclassifier.coef_.shape)
(1, 3)

Because you have only 2 class, there's only 1 row. We can do:

from matplotlib import pyplot as plt
from sklearn import svm

def f_importances(coef, names):
    imp = coef
    imp,names = zip(*sorted(zip(imp,names)))
    plt.barh(range(len(names)), imp, align='center')
    plt.yticks(range(len(names)), names)
    plt.show()

features_names = ['input1', 'input2','input3']
svclassifier = SVC(kernel='linear')
svclassifier.fit(X_train, y_train)

f_importances(svclassifier.coef_[0], features_names)

This is the plot I got :

enter image description here

StupidWolf
  • 45,075
  • 17
  • 40
  • 72