LinearSVC and roc_auc_score() for a multi-class problem

Question

I have a multi-class problem. I tried to calculate the ROC-AUC score using the function metrics.roc_auc_score() from sklearn. This function has support for multi-class but it needs the estimated probabilities, for that the classifier needs to have the method predict_proba() (which svm.LinearSVC() does not have).

Here is an example of what I trying to do:

from sklearn import datasets
from sklearn.svm import SVC
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split


# Get the data
iris = datasets.load_iris()
X, y = iris.data, iris.target

# Create the model
clf = SVC(kernel='linear', probability=True)

# Split the data in train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)

# Train the model
clf.fit(X_train, y_train)

# Predict the test data
predicted = clf.predict(X_test)
predicted_proba = clf.predict_proba(X_test)
roc_auc = roc_auc_score(y_test, predicted_proba, multi_class='ovr')

I tried to use svm.SVC() with a linear kernel and the parameter probability set it to True. This allows me to use the method predict_proba() from this function. The problem is takes a long time to finish compared to LinearSVC() when you have a big dataset (the example is really quit because is a small amount of samples). Is there a way to use LinearSVC() and roc_auc_score() for a multi-class problem?

Does this help? https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html — Grayrigel, Oct 22 '20 at 21:36
@Grayrigel the problem is that I can't calculate the estimate probabilities with ```svm.LinearSVC()```. — Luis Miguel, Oct 22 '20 at 22:14

score 2 · Accepted Answer · answered Oct 23 '20 at 08:11

2

There is a specially dedicated class CalibratedClassifierCV for the cases like this:

from sklearn import datasets
from sklearn.svm import LinearSVC
from sklearn.calibration import CalibratedClassifierCV
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split


# Get the data
iris = datasets.load_iris()
X, y = iris.data, iris.target

# Create the model
clf = CalibratedClassifierCV(LinearSVC(max_iter=10000))

# Split the data in train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)

# Train the model
clf.fit(X_train, y_train)

# Predict the test data
predicted = clf.predict(X_test)
predicted_proba = clf.predict_proba(X_test)
roc_auc = roc_auc_score(y_test, predicted_proba, multi_class='ovr')

As you're choosing between SVC and LinearSVC you may wish to check out this When should one use LinearSVC or SVC?

answered Oct 23 '20 at 08:11

Sergey Bushmanov

23,310
7
53
72

I prefer to use ```LinearSVC()``` because is faster, but as it say un my question I have to calculate the ROC-AUC score. I have a question of your code, why is ```max_iter``` set it to 10000? – Luis Miguel Oct 23 '20 at 13:57
Try without it as see the warning as a reason – Sergey Bushmanov Oct 23 '20 at 13:58
I see. I compare the results your approach give me **0.93** for ROC-AUC score and using ```SVC()``` give me **0.99**. – Luis Miguel Oct 23 '20 at 14:02
These are different algos.... See the link I provided – Sergey Bushmanov Oct 23 '20 at 14:03
How would you plot a reliability curve for multiclass using the above example? – Maths12 Sep 07 '21 at 11:40

LinearSVC and roc_auc_score() for a multi-class problem

1 Answers1