4

When data is offset (not centered in zero), LinearSVC() and SVC(kernel='linear') are giving awfully different results. (EDIT: the problem might be it does not handle non-normalized data.)

import matplotlib.pyplot as plot
plot.ioff()
import numpy as np
from sklearn.datasets.samples_generator import make_blobs
from sklearn.svm import LinearSVC, SVC


def plot_hyperplane(m, X):
    w = m.coef_[0]
    a = -w[0] / w[1]
    xx = np.linspace(np.min(X[:, 0]), np.max(X[:, 0]))
    yy = a*xx - (m.intercept_[0]) / w[1]
    plot.plot(xx, yy, 'k-')

X, y = make_blobs(n_samples=100, centers=2, n_features=2,
                  center_box=(0, 1))
X[y == 0] = X[y == 0] + 100
X[y == 1] = X[y == 1] + 110

for i, m in enumerate((LinearSVC(), SVC(kernel='linear'))):
    m.fit(X, y)
    plot.subplot(1, 2, i+1)
    plot_hyperplane(m, X)

    plot.plot(X[y == 0, 0], X[y == 0, 1], 'r.')
    plot.plot(X[y == 1, 0], X[y == 1, 1], 'b.')

    xv, yv = np.meshgrid(np.linspace(98, 114, 10), np.linspace(98, 114, 10))
    _X = np.c_[xv.reshape((xv.size, 1)), yv.reshape((yv.size, 1))]
    _y = m.predict(_X)

    plot.plot(_X[_y == 0, 0], _X[_y == 0, 1], 'r.', alpha=0.4)
    plot.plot(_X[_y == 1, 0], _X[_y == 1, 1], 'b.', alpha=0.4)

plot.show()

This is the result I get:

bug

(left=LinearSVC(), right=SVC(kernel='linear'))

sklearn.__version__ = 0.17. But I also tested in Ubuntu 14.04, which comes with 0.15.

I thought about reporting the bug, but it seems too evident to be a bug. What am I missing?

Ricardo Magalhães Cruz
  • 3,504
  • 6
  • 33
  • 57

1 Answers1

1

Reading the documentation, they are using different underlying implementations. LinearSVC is using liblinear where SVC is using libsvm.

Looking closely at the coefficients and intercept, it seems LinearSVC applies regularization to the intercept where SVC does not.

By adding intercept_scaling, I was able to obtain the same results to both.

LinearSVC(loss='hinge', intercept_scaling=1000)

Comparsion after intercept scaling

David Maust
  • 8,080
  • 3
  • 32
  • 36
  • Looking at this close, it seems there is an optimization problem with the scale of the variables. Going to extend my answer later. – David Maust Jan 15 '16 at 20:44
  • Thank you. So, if I do not want to normalize my datasets and I have no time to go one by one, I should stick with `SVC(kernel='linear')`, right? – Ricardo Magalhães Cruz Jan 17 '16 at 11:40
  • 1
    `SVC` seems much less finicky :-). It is generally a good idea with any gradient descent optimizer to use feature scaling and mean centering. Any reason you are avoiding it? It can be easily implemented in scikit-learn with a `Pipeline` and `StandardScaler`. It only becomes annoying if you are trying to interpret the coefficients yourself. – David Maust Jan 17 '16 at 18:44
  • It is just that we are using data from UCI, and we wanted to compare a method of ours with `SVC` using a linear kernel. But yeah, we are going to standardized the data for now, and maybe consider using `SVC` for a final run. `LinearSVC` is naturally much faster. – Ricardo Magalhães Cruz Jan 18 '16 at 21:38