0

I want to use SelectFromModel for selecting the best features for my model. However, I get an error when I want to define classification model.

For example (see the code below) this code works, it also works for decision tree, random forest and logistic regression:

import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split

from sklearn.feature_selection import RFE, SelectFromModel

from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression


df_data = pd.read_csv('data.csv', sep = ' ', header=None)
df_target = pd.read_csv('target.csv', names=['output'])

x = full_df.iloc[:,:-1]
y = full_df.iloc[:,-1]

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.3, random_state = 0)

print(x_train.shape)

clf = SVC(kernel = 'linear').fit(x_train, y_train) 

model = SelectFromModel(clf, prefit=True)

print(model.transform(x_train).shape)

But when I try to use different classifier, for example:

clf = SVC(kernel = 'poly').fit(x_train, y_train) 
clf = SVC(kernel = 'sigmoid').fit(x_train, y_train) 
clf = SVC(kernel = 'rbf').fit(x_train, y_train)

It gives me the error:

ValueError: The underlying estimator SVC has no `coef_` or `feature_importances_` attribute. Either pass a fitted estimator to SelectFromModel or call fit before calling transform.

Why it gives me this error, my classifiers are all on the same place, and they are fitted?

taga
  • 3,537
  • 13
  • 53
  • 119
  • Possible duplicate of [How to obtain features' weights](https://stackoverflow.com/questions/21260691/how-to-obtain-features-weights) – hellpanderr Aug 19 '19 at 19:37
  • @hellpanderr the dupe doesn't look to me as if it would answer this question in a straight forward manner. Could you explain how it applies? – Arne Aug 20 '19 at 09:38
  • @Arte coefficients (and therefore feature importances) are available only for linear svm kernel – hellpanderr Aug 20 '19 at 09:51

1 Answers1

0

SelectFromModel uses feature_importance_ and coef_.

features_importance_ is provided by ML algorithms that calculates it to perform its decisions like in Decision Trees and Random Forest (this is the reason for working).

In your case, you are trying to retrieve this information from SVC model and according to SVC documentation only the linear kernel provides coef_:

coef_: Weights assigned to the features (coefficients in the primal problem). This is only available in the case of a linear kernel.