-1

I have a multi-class problem. I tried to calculate the ROC-AUC score using the function metrics.roc_auc_score() from sklearn. This function has support for multi-class but it needs the estimated probabilities, for that the classifier needs to have the method predict_proba() (which svm.LinearSVC() does not have).

Here is an example of what I trying to do:

from sklearn import datasets
from sklearn.svm import SVC
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split


# Get the data
iris = datasets.load_iris()
X, y = iris.data, iris.target

# Create the model
clf = SVC(kernel='linear', probability=True)

# Split the data in train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)

# Train the model
clf.fit(X_train, y_train)

# Predict the test data
predicted = clf.predict(X_test)
predicted_proba = clf.predict_proba(X_test)
roc_auc = roc_auc_score(y_test, predicted_proba, multi_class='ovr')

I tried to use svm.SVC() with a linear kernel and the parameter probability set it to True. This allows me to use the method predict_proba() from this function. The problem is takes a long time to finish compared to LinearSVC() when you have a big dataset (the example is really quit because is a small amount of samples). Is there a way to use LinearSVC() and roc_auc_score() for a multi-class problem?

Luis Miguel
  • 193
  • 1
  • 2
  • 10

1 Answers1

2

There is a specially dedicated class CalibratedClassifierCV for the cases like this:

from sklearn import datasets
from sklearn.svm import LinearSVC
from sklearn.calibration import CalibratedClassifierCV
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split


# Get the data
iris = datasets.load_iris()
X, y = iris.data, iris.target

# Create the model
clf = CalibratedClassifierCV(LinearSVC(max_iter=10000))

# Split the data in train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)

# Train the model
clf.fit(X_train, y_train)

# Predict the test data
predicted = clf.predict(X_test)
predicted_proba = clf.predict_proba(X_test)
roc_auc = roc_auc_score(y_test, predicted_proba, multi_class='ovr')

As you're choosing between SVC and LinearSVC you may wish to check out this When should one use LinearSVC or SVC?

Sergey Bushmanov
  • 23,310
  • 7
  • 53
  • 72