SVC vs LinearSVC in scikit learn: difference of loss function

Question

According to this post, SVC and LinearSVC in scikit learn are very different. But when reading the official scikit learn documentation, it is not that clear.

Especially for the loss functions, it seems that there is an equivalence:

And this post says that le loss functions are different:

SVC : 1/2||w||^2 + C SUM xi_i
LinearSVC: 1/2||[w b]||^2 + C SUM xi_i

It seems that in the case of LinearSVC, the intercept is regularized, but the official documentation says otherwise.

Does anyone have more information? Thank you

igrinis · Answer 1 · 2020-10-15T12:30:16.597

4

SVC is a wrapper of LIBSVM library, while LinearSVC is a wrapper of LIBLINEAR

LinearSVC is generally faster then SVC and can work with much larger datasets, but it can only use linear kernel, hence its name. So the difference lies not in the formulation but in the implementation approach.

Quoting LIBLINEAR FAQ:

When to use LIBLINEAR but not LIBSVM

There are some large data for which with/without nonlinear mappings gives similar performances. 
Without using kernels, one can quickly train a much larger set via a linear classifier. 
Document classification is one such application. 
In the following example (20,242 instances and 47,236 features; available on LIBSVM data sets), 
the cross-validation time is significantly reduced by using LIBLINEAR:

% time libsvm-2.85/svm-train -c 4 -t 0 -e 0.1 -m 800 -v 5 rcv1_train.binary
Cross Validation Accuracy = 96.8136%
345.569s

% time liblinear-1.21/train -c 4 -e 0.1 -v 5 rcv1_train.binary
Cross Validation Accuracy = 97.0161%
2.944s

Warning:While LIBLINEAR's default solver is very fast for document classification, it may be slow in other situations. See Appendix C of our SVM guide about using other solvers in LIBLINEAR.
Warning:If you are a beginner and your data sets are not large, you should consider LIBSVM first.

edited Oct 15 '20 at 12:30

answered Oct 15 '20 at 12:16

igrinis

12,398
20
45

The difference is not only the speed, they are different. I made a simple example [here](https://stackoverflow.com/questions/62232929/behavior-of-c-in-linearsvc-sklearn-scikit-learn). And you can also read [this](https://stackoverflow.com/questions/33843981/under-what-parameters-are-svc-and-linearsvc-in-scikit-learn-equivalent) – John Smith Oct 16 '20 at 15:08
My question is about the loss function of the two classifiers. Thank you – John Smith Oct 16 '20 at 15:10
1

You can find more implementation details in the Appendices of the origiinal `LIBLINEAR` [paper](https://www.csie.ntu.edu.tw/~cjlin/papers/liblinear.pdf) – igrinis Oct 18 '20 at 10:16
1

The answer in the [post](https://stackoverflow.com/questions/33843981/under-what-parameters-are-svc-and-linearsvc-in-scikit-learn-equivalent) is correct. `LIBLINEAR` does includes bias term in optimization, while `LIBSVM` does not. – igrinis Oct 18 '20 at 15:30
1

`SVC` defaults to L1 loss and L2 penalty. This is why you can create conditions when the results of both are almost equal, if you set for `LinearSVM` `loss="hinge"` and `intercept_scaling` large enough. Bias term is included in `LIBLINEAR` as weight vector is implicitly extended as `w=[w;b]`. If you center your data before optimizing, it should effectively set bias to zero. – igrinis Oct 18 '20 at 15:51
So, there is an error in the scikit learn documentation? For LinearSVC, the math formula should include a penalty for the bias `b`, right? – John Smith Oct 18 '20 at 19:43
It surely is confusing and incomplete. There is an [issue](https://github.com/scikit-learn/scikit-learn/pull/11763) discussing setting warning that intercept is regularized in code (and eventually not doing it), but no mentions whatsoever in documentation. – igrinis Oct 18 '20 at 20:17

SVC vs LinearSVC in scikit learn: difference of loss function

1 Answers1