58

Using the code below for svm in python:

from sklearn import datasets
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC
iris = datasets.load_iris()
X, y = iris.data, iris.target
clf = OneVsRestClassifier(SVC(kernel='linear', probability=True, class_weight='auto'))
clf.fit(X, y)
proba = clf.predict_proba(X)

But it is taking a huge amount of time.

Actual Data Dimensions:

train-set (1422392,29)
test-set (233081,29)

How can I speed it up(parallel or some other way)? Please help. I have already tried PCA and downsampling.

I have 6 classes. Edit: Found http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html but I wish for probability estimates and it seems not to so for svm.

Edit:

from sklearn import datasets
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC,LinearSVC
from sklearn.linear_model import SGDClassifier
import joblib
import numpy as np
from sklearn import grid_search
import multiprocessing
import numpy as np
import math

def new_func(a):                              #converts array(x) elements to (1/(1 + e(-x)))
    a=1/(1 + math.exp(-a))
    return a

if __name__ == '__main__':
    iris = datasets.load_iris()
    cores=multiprocessing.cpu_count()-2
    X, y = iris.data, iris.target                       #loading dataset

    C_range = 10.0 ** np.arange(-4, 4);                  #c value range 
    param_grid = dict(estimator__C=C_range.tolist())              

    svr = OneVsRestClassifier(LinearSVC(class_weight='auto'),n_jobs=cores) ################LinearSVC Code faster        
    #svr = OneVsRestClassifier(SVC(kernel='linear', probability=True,  ##################SVC code slow
    #   class_weight='auto'),n_jobs=cores)

    clf = grid_search.GridSearchCV(svr, param_grid,n_jobs=cores,verbose=2)  #grid search
    clf.fit(X, y)                                                   #training svm model                                     

    decisions=clf.decision_function(X)                             #outputs decision functions
    #prob=clf.predict_proba(X)                                     #only for SVC outputs probablilites
    print decisions[:5,:]
    vecfunc = np.vectorize(new_func)
    prob=vecfunc(decisions)                                        #converts deicision to (1/(1 + e(-x)))
    print prob[:5,:]

Edit 2: The answer by user3914041 yields very poor probability estimates.

Abhishek Bhatia
  • 9,404
  • 26
  • 87
  • 142
  • 1
    Quantify "huge amount of time." What have you used to profile your code? –  Jul 28 '15 at 16:11
  • @tristan Thanks for comment. I am stating roughly by random runs of the code. I am roughly measuring it by the output checks in the code, which is bad way to do. Does that answer your question? – Abhishek Bhatia Jul 28 '15 at 16:16
  • 1
    Do you need all 1.4 million training examples? According to the [docs](http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC) The fit time complexity is more than quadratic in the number of training examples. Additionally, do you need the probability estimates? That requires an additional run of cross-validation to generate. – rabbit Jul 28 '15 at 16:19
  • @NBartley. Thanks for info! As mentioned, I can downsample but it is not preferrable. Yes, I need probability estimates bounded by some competition format. – Abhishek Bhatia Jul 28 '15 at 16:21
  • 2
    The OneVsRestClassifier comes with an option for parallelism, but be warned that it may eat up many of your resources, as it will take a significant time to fit each of the models. Try setting the n_jobs parameter according to the docs [here](http://scikit-learn.org/stable/modules/generated/sklearn.multiclass.OneVsRestClassifier.html). – rabbit Jul 28 '15 at 16:24
  • 2
    Try MKL Optimizations from Continuum, see https://store.continuum.io/cshop/mkl-optimizations/. They offer a 30 day free trial and cost is $99. I am not a sales rep, but I use their Anaconda Python distribution and like it - it was recommended at Spark Summit training. Incidentally Spark supports SVM and running it on even a small Spark cluster would greatly improve performance, see https://spark.apache.org/docs/1.1.0/mllib-linear-methods.html#linear-support-vector-machines-svms. –  Jul 28 '15 at 16:33
  • 1
    @TrisNefzger Spark won't work because it does not support probability estimates for SVM – yangjie Jul 28 '15 at 16:40
  • @TrisNefzger Thanks for the useful knowledge! I do have a HPC cluster with me. But if doesn't offer probability estimates then it wouldn't be of much use. – Abhishek Bhatia Jul 28 '15 at 16:41
  • 1
    I haven't looked much into it, but I think IPython Parallel / Starcluster might be worth checking out as well. [Here's a gist](https://gist.github.com/ogrisel/5115540) with demo code from one of the sklearn contributors' tutorials. But to build off Tris's comment, you're going to want to try to move over to a cluster at some point. And if sklearn doesn't work easily on a cluster, you might want to consider writing your own code on top of these other libraries that gives you the probability estimates you need. – rabbit Jul 28 '15 at 16:48
  • @NBartley Thanks for the reply again! I tried using `OneVsRestClassifier` in parallel. I allocated it 14 cores but it seemed to use only 6 of them (which is equal to the number of classes). Any reason for this you know, I am unsure how parallel gradient works. If I cannot run on more than the number of classes, I don't think using cluster would be of help.(Also, I have already have more than enough RAM ~48GB on desktop. So there is no problem of memory.) – Abhishek Bhatia Jul 28 '15 at 17:29
  • 1
    Yes SVM takes so much time and way slow in CPUs. You will need to whiten the PCA data, to make it faster or try to find a library that runs in GPU. – pbu Jul 28 '15 at 17:31
  • @pbu Thanks for the reply! I do whiten the data. I can't find any such library. Can you mention why using GPU would help. – Abhishek Bhatia Jul 28 '15 at 17:42
  • 1
    It's not really parallel gradient so much as it's fitting the 6 separate OneVsRest models in parallel, so it makes sense that it won't parallelize more than that. If you intend to stay with Python and `sklearn.SVC` because of the probability estimates then it seems to me like your best bet might be to downsample, PCA, and use OneVsRest with 6 jobs. – rabbit Jul 28 '15 at 17:47
  • @NBartley Thanks for the info! I myself would prefer Matlab over python but since most of other code in python. Also, the time doesn't permit the change. – Abhishek Bhatia Jul 28 '15 at 19:41
  • Running on GPU is 20X, plus if you run native c/c++ code, it adds to the speed. Python always slow (atleast to me!). Take a look here: https://devtalk.nvidia.com/default/topic/485456/support-vector-machine-are-there-some-great-cuda-svms-/ – pbu Jul 28 '15 at 23:48
  • @NBartley Downsampling and PCA don't give good results. If you come to know any other possiblity please let me know. Probably using LinearSVC to render probability estimates. – Abhishek Bhatia Jul 29 '15 at 18:22
  • If that isn't working for your purposes, then I agree. LinearSVC with calibrated probability estimates is then another good option. I would imagine that you can also try regularized Logistic Regression again with appropriate parameters, even if it has yielded lower accuracy as you mention below. It's very difficult to gauge what will work best for you without knowing anything else about the data. :/ – rabbit Jul 29 '15 at 19:38
  • @NBartley Hi, thanks for info! I tried some code for LinearSVC with calibrated probability estimates, please check(Edit in Question). Using maximum likelohood should probably be better from what I could find. `1/1+exp(Ax+B)`, where `A` and `B` are parameters learned by ML estimate. Can you help how to implement it. I can't seem to find a starting point. – Abhishek Bhatia Jul 30 '15 at 17:26
  • You should check this new library to speed up the training process > https://intel.github.io/scikit-learn-intelex/ – AsadMajeed Jan 08 '22 at 12:52
  • The parallel didn't happen in SVC but in multi-classifier part, I run it, and seems that was the case – cloudscomputes Jul 07 '23 at 10:20

5 Answers5

133

If you want to stick with SVC as much as possible and train on the full dataset, you can use ensembles of SVCs that are trained on subsets of the data to reduce the number of records per classifier (which apparently has quadratic influence on complexity). Scikit supports that with the BaggingClassifier wrapper. That should give you similar (if not better) accuracy compared to a single classifier, with much less training time. The training of the individual classifiers can also be set to run in parallel using the n_jobs parameter.

Alternatively, I would also consider using a Random Forest classifier - it supports multi-class classification natively, it is fast and gives pretty good probability estimates when min_samples_leaf is set appropriately.

I did a quick tests on the iris dataset blown up 100 times with an ensemble of 10 SVCs, each one trained on 10% of the data. It is more than 10 times faster than a single classifier. These are the numbers I got on my laptop:

Single SVC: 45s

Ensemble SVC: 3s

Random Forest Classifier: 0.5s

See below the code that I used to produce the numbers:

import time
import numpy as np
from sklearn.ensemble import BaggingClassifier, RandomForestClassifier
from sklearn import datasets
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC

iris = datasets.load_iris()
X, y = iris.data, iris.target

X = np.repeat(X, 100, axis=0)
y = np.repeat(y, 100, axis=0)
start = time.time()
clf = OneVsRestClassifier(SVC(kernel='linear', probability=True, class_weight='auto'))
clf.fit(X, y)
end = time.time()
print "Single SVC", end - start, clf.score(X,y)
proba = clf.predict_proba(X)

n_estimators = 10
start = time.time()
clf = OneVsRestClassifier(BaggingClassifier(SVC(kernel='linear', probability=True, class_weight='auto'), max_samples=1.0 / n_estimators, n_estimators=n_estimators))
clf.fit(X, y)
end = time.time()
print "Bagging SVC", end - start, clf.score(X,y)
proba = clf.predict_proba(X)

start = time.time()
clf = RandomForestClassifier(min_samples_leaf=20)
clf.fit(X, y)
end = time.time()
print "Random Forest", end - start, clf.score(X,y)
proba = clf.predict_proba(X)

If you want to make sure that each record is used only once for training in the BaggingClassifier, you can set the bootstrap parameter to False.

Alexander Bauer
  • 2,014
  • 1
  • 12
  • 11
  • 1
    Thanks for the amazing answer!! I didn't know about these. In addition to speed, accuracy is also my prime concern. Could you give a comparison of that if possible? I am not bound to `SVC`, please suggest other good approaches also if you want. – Abhishek Bhatia Aug 17 '15 at 03:56
  • Also you could check out the `sklearn.ensemble.AdaBoostClassifier` for use with random forest or decision trees. – jchook Oct 04 '16 at 16:45
  • 1
    If you want a linear kernel, you can use `sklearn.svm.LinearSVC` which is basically the same, but implemented with a faster library than the `sklearn.svm.SVC`. – fdelia Oct 19 '17 at 09:25
  • The `RandomForestClassifier` works amazingly fast, but from what I understand it doesn't use linear / poly kernels like SVC do it gives lower accuracy. Can I improve accuracy of `RandomForestClassifier`? – CIsForCookies Dec 27 '17 at 20:31
  • @Alexander Bauer Sorry, could you explain what the chaining of the classifiers does? OneVsRestClassifier(BaggingClassifier(SVC ... – lppier May 12 '18 at 02:36
  • 2
    This is a great approach!: I got similar results on F1 Score; when ran without BaggingClassifier it took 4d 3h 27min, but ran with BaggingClassifier it took 31min 8s – kaleemsagard Jun 04 '20 at 00:01
  • first time training an SVM, i tried increasing n_jobs to more than 1, but i realise it does not run at all. Only when n_jobs = 1 only does it run. Do you know what could be the cause ? – fatbringer Nov 15 '22 at 02:21
23

SVM classifiers don't scale so easily. From the docs, about the complexity of sklearn.svm.SVC.

The fit time complexity is more than quadratic with the number of samples which makes it hard to scale to dataset with more than a couple of 10000 samples.

In scikit-learn you have svm.linearSVC which can scale better. Apparently it could be able to handle your data.

Alternatively you could just go with another classifier. If you want probability estimates I'd suggest logistic regression. Logistic regression also has the advantage of not needing probability calibration to output 'proper' probabilities.

Edit:

I did not know about linearSVC complexity, finally I found information in the user guide:

Also note that for the linear case, the algorithm used in LinearSVC by the liblinear implementation is much more efficient than its libsvm-based SVC counterpart and can scale almost linearly to millions of samples and/or features.

To get probability out of a linearSVC check out this link. It is just a couple links away from the probability calibration guide I linked above and contains a way to estimate probabilities. Namely:

    prob_pos = clf.decision_function(X_test)
    prob_pos = (prob_pos - prob_pos.min()) / (prob_pos.max() - prob_pos.min())

Note the estimates will probably be poor without calibration, as illustrated in the link.

ldirer
  • 6,606
  • 3
  • 24
  • 30
  • Thanks for the reply! About scaling @NBartley has mentioned it previously. I have tried logistic regression, it gives lesser accuracy. – Abhishek Bhatia Jul 28 '15 at 17:24
  • 1
    Thanks for reply! But linearSVC has no option of outputting the probability estimates. – Abhishek Bhatia Jul 28 '15 at 17:40
  • 1
    You're right. A possible workaround is to use the `decision_function` attribute, as it is done with LinearSVC in the link I gave about probability calibration. You'll definitely need to calibrate for the probabilities to make sense though. – ldirer Jul 28 '15 at 18:05
  • Can you elucidate more on the calibration part. – Abhishek Bhatia Jul 28 '15 at 19:21
  • 2
    If you have specific questions feel free to ask but for the concept I won't be able to do a better job than the link I gave in the post. – ldirer Jul 28 '15 at 20:23
  • How can convert the decision function(http://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html#sklearn.svm.LinearSVC.decision_function) to estimates. Simply using [1 / (1 + exp(-x))] doesn't seem good. Maybe using 1/1+exp(Ax+B), where A and B are parameters learned by Max likelihood estimate is better. But I am still very new both to ml and python. Can you provide some starting code for the conversion. – Abhishek Bhatia Jul 29 '15 at 17:33
  • @AbhishekBhatia I edited my answer. It takes some time but you'd probably learn a lot (at least I did) by reading the guide on probability calibration. – ldirer Jul 29 '15 at 17:50
  • Thanks a great deal! Does the implementation `predict_proba()` similar with what you have mentioned? – Abhishek Bhatia Jul 30 '15 at 17:29
  • This gives very poor results. Thus, I have unaccepted the answer. Please suggest alternative approaches – Abhishek Bhatia Aug 12 '15 at 06:28
  • @AbhishekBhatia Very poor as in "compared to SVC with One vs Rest and predict_proba on a small sample, using xxx as a metric/heuristic"? Are you using probability calibration? – ldirer Aug 12 '15 at 07:06
  • No I am not using any calibration. Can you elucidate more please. – Abhishek Bhatia Aug 12 '15 at 07:13
  • I already told you I could not do better than this link http://scikit-learn.org/stable/auto_examples/calibration/plot_compare_calibration.html. I also mentioned in my answer that the estimates would probably be poor without calibration. If you don't even try to read it I can't help you. – ldirer Aug 12 '15 at 07:52
  • @user3914014 Thanks for the response again. I tried using some other approaches for calibration after reading your advise namely CalibratedClassifierCV(no support for multilabel). But I am not able to find a fitting one. Can you please direct me in the right direction by giving some example. – Abhishek Bhatia Aug 12 '15 at 09:29
  • @ldirer thanks for tip, I'm facing an issue with multilabel classification where CalibratedClassifierCV does not accept multilabel (MultiLabelBinarizer) vectors. I'm using `OneVsRestClassifier(LinearSVC())`. Do you know of a way to calibrate this way? – Floran Gmehlin Mar 19 '18 at 10:10
9

You can use the kernel_approximation module to scale up SVMs to a large number of samples like this.

serv-inc
  • 35,772
  • 9
  • 166
  • 188
Andreas Mueller
  • 27,470
  • 8
  • 62
  • 74
7

It was briefly mentioned in the top answer; here is the code: The quickest way to do this is via the n_jobs parameter: replace the line

clf = OneVsRestClassifier(SVC(kernel='linear', probability=True, class_weight='auto'))

with

clf = OneVsRestClassifier(SVC(kernel='linear', probability=True, class_weight='auto'), n_jobs=-1)

This will use all available CPUs on your Computer, while still doing the same computation as before.

serv-inc
  • 35,772
  • 9
  • 166
  • 188
3

For large datasets consider using LinearSVC or SGDClassifier instead, possibly after a Nystroem transformer.

https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html