SVC size >> LinearSVC size when pickling

Question

I experiment with multiple classifiers. I want all the classifiers saved and easily accessible from me during testing. At present, when using LinearSVC, the trained model is 5 MB or less. When using SVC, the model size becomes more than 400 MB, which takes almost one minute to load to memory. I am ok using LinearSVC but I would like also to experiment with RBF kernels. I cannot understand the humongous difference between the predescribed sizes. Can anyone explain to me why this happens (if it is explainable, otherwise point me to a probable bug) and maybe propose a solution to truncate the size of the SVC model, or evade the usage of SVC for RBF kernel implementation? Thank you all.

Example

Taken from the tutorials page and added pickle.

import os
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets
import cPickle as pickle
# import some data to play with
iris = datasets.load_iris()
X = iris.data[:, :2] 
y = iris.target
C = 1.0  # SVM regularization parameter
svc = svm.SVC(kernel='linear', C=C).fit(X, y)
lin_svc = svm.LinearSVC(C=C).fit(X, y)
rbf_svc = svm.SVC(kernel='rbf', gamma=0.7, C=C).fit(X, y)
with open('svcpick','w') as out:
    pickle.dump(svc,out)
with open('rbfsvcpick','w') as out:
    pickle.dump(rbf_svc,out)
with open('linsvcpick','w') as out:
    pickle.dump(lin_svc,out)
print 'SVC(Linear):',os.path.getsize('./svcpick'),' B'
print 'SVC(RBF):',os.path.getsize('./rbfsvcpick'),' B'
print 'LinearSVC:',os.path.getsize('./linsvcpick'),' B'

Output:

SVC(Linear): 11481 B
SVC(RBF): 12087 B
LinearSVC: 1188 B

Another example for multilabel classification

Again taken (partly) from tutorials

import os
import numpy as np
from sklearn import svm, datasets
from sklearn.datasets import make_multilabel_classification
from sklearn.multiclass import OneVsRestClassifier
import cPickle as pickle
# import some data to play with
X, Y = make_multilabel_classification(n_classes=10, n_labels=1,
                                      allow_unlabeled=True,
                                      random_state=1)
msvc = OneVsRestClassifier(svm.SVC(kernel='linear')).fit(X, Y)        
mrbf_svc = OneVsRestClassifier(svm.SVC(kernel='rbf')).fit(X, Y)    
mlin_svc = OneVsRestClassifier(svm.LinearSVC()).fit(X, Y)   

with open('msvcpick','w') as out:
    pickle.dump(msvc,out)
with open('mrbfsvcpick','w') as out:
    pickle.dump(mrbf_svc,out)
with open('mlinsvcpick','w') as out:
    pickle.dump(mlin_svc,out)
print 'mSVC(Linear):',os.path.getsize('./msvcpick'),' B'
print 'mSVC(RBF):',os.path.getsize('./mrbfsvcpick'),' B'
print 'mLinearSVC:',os.path.getsize('./mlinsvcpick'),' B'

Output:

mSVC(Linear): 126539 B
mSVC(RBF): 561532 B
mLinearSVC: 9782 B

In my implementation I am trying to use multilabel classification with more than 2 classes, that's why I changed the default value to 10.One can see the difference in size. In my implementation mLinearSVC has size more than 1 MB, not 10KB, as shown above, due to the multidimensional data I have to process (256 features each sample).

There are many possible reasons for this. But without code it's just guessing. — sascha, Mar 08 '17 at 05:32
For my understanding, SVM try to construct a hyper-plane for each class, so while linear kernels have a lineal representation of the hyper-plane, rbf kernel uses a more complex formula. So this is take more space to save. Try to use less clas and look how is increasing. — Tzomas, Mar 08 '17 at 11:19
@Tzomas I understand your point, but I cannot see why the LinearSVC() and SVC(kernel='linear') should have 10+ times different size. — Vasilis Lemonidis, Mar 08 '17 at 11:22
@VasilisLemonidis beacuse they use two different libraries, LinearSVC uses liblinear rather than libsvm. They need different information to work. You can see how they look on this example image: http://scikit-learn.org/stable/_images/sphx_glr_plot_iris_0012.png — Tzomas, Mar 08 '17 at 11:24
So, is there not a way to convert the liblinear information to libsvm after loading? Or at least do a truncating of the data, assuming default parameters? The core information is the same after all. Please write an answer, if you believe you know that. Thanks. The same question stands for RBF, if an assumption was made that a 'librbf' existed, I would expect it, by intuition, to use only the one tenth of the space libsvm uses. — Vasilis Lemonidis, Mar 08 '17 at 11:31
@Tzomas Also, if intercept_scaling is used in linearSVC, the results can get to be the same (http://stackoverflow.com/questions/34811770/linearsvc-differs-from-svckernel-linear) — Vasilis Lemonidis, Mar 08 '17 at 11:36
@VasilisLemonidis i don't understand exactly what do you mean by truncate the data of the model. — Tzomas, Mar 08 '17 at 11:38
They are probably just learning different models. As your code shows no sign of checking this, you can't be sure. Check at least the support-vector ratios! — sascha, Mar 08 '17 at 11:42
@Tzomas Sorry, I might use a wrong word for this, I mean to ignore unnecessary or default data and save only trained information and modified parameters. — Vasilis Lemonidis, Mar 08 '17 at 11:42
@sascha How can they use different models? A convex optimization is performed in both cases, using Lagrangian multipliers, right? The way they solve it arithmetically does not count, as the dimensionality of the result must be the same, that is weights and biases. Please elaborate in an answer, as this is a very interesting point, or post a link. — Vasilis Lemonidis, Mar 08 '17 at 11:46
Just check the SVs first and read [this too](http://stackoverflow.com/questions/35076586/linearsvc-vs-svckernel-linear-conflicting-arguments). — sascha, Mar 08 '17 at 11:49
SV? I don't know, but the link you posted talks about training, not the resulting model. Equation (1) in https://en.wikipedia.org/wiki/Support_vector_machine holds everything I want about the trained model, why do I need all the other parameters, if I do not want to retrain or alter the trained model in any way? — Vasilis Lemonidis, Mar 08 '17 at 11:54
Because all practical implementations differ from the one classical formula. Seriously, i did give you 2 important steps. Check the model (at least the support-vectors; easy in sklearn) and understand the related post. There is much to discover. Take your time. To be more precise: the link gives a possible explanation and checking your SVs is one kind of validating this. — sascha, Mar 08 '17 at 11:55
@sascha Sorry if I have frustrated you, I am going to assume that a smaller RBF file cannot be possible (I still think this is impractical in realtime applications and it is a shame it happens). I will check out the info you provided. A formal answer would be better of course, but thanks. — Vasilis Lemonidis, Mar 08 '17 at 12:04
Your assumption is wrong, but it's hard to put this in comments. It's all about the model. If the rbf-svm just works better, it can be smaller given some min-performance. But that's more theoretical stuff. Realtime applications would probably never do that IO-stuff too. Just keep enough RAM. But think about the effects of different models. You are talking about size or IO-time, but more support-vectors will also be much worse in prediction-time (a well generalized compact model is general the reason for SVMs powerful standing in ML; still strong performance in many dims poss. if sparse). — sascha, Mar 08 '17 at 12:13
I rephrase, I mean that it seems it is not possible using present scikit learn. You are right about the IO operations, of course everythin depends on the computational power of the machine used, that is why I do not expand my question in the time space. RAM is always a problem, but a careful prior organization of data can save you from the trouble. Anyway, thank you for elaborating. — Vasilis Lemonidis, Mar 08 '17 at 12:23
There seems to be a misunderstanding. We are talking about theoretical-limits here in regards to those pickled sizes. This has nothing to do with sklearn, libsvm, linearsvm. It's the basic model of SVMs which defines those sizes after (there are small ones if generalization is good; big ones if not; it's all about parameter-tuning and the underlying data) — sascha, Mar 08 '17 at 12:33
After seaching with the right terms in Google, I ended up to these quite intriguing posts: https://cmry.github.io/notes/serialize https://cmry.github.io/notes/serialize-sk which generalize my query and actually provided me with the tools to save only what is needed from both types of models. I will post an answer when I am finished. — Vasilis Lemonidis, Apr 06 '17 at 13:08

SVC size >> LinearSVC size when pickling

Example

Another example for multilabel classification

0 Answers0