5

I have used sklearn to train a set of SVM classifiers (mostly linear using LinearSVM but some of them are using the SVC class with rbf kernel) and I am pretty happy with the results. Now I need to export the classifiers in production into another codebase that uses Java. I am looking for possible libraries, that are published in maven, that can be easily incorporated in this new codebase.

What do you suggest?

nopper
  • 825
  • 11
  • 18

2 Answers2

8

Linear classifiers are easy: they have a coef_ and an intercept_, described in the class docstrings. Those are regular NumPy arrays, so you can dump them to disk with standard NumPy functions.

>>> from sklearn.datasets import load_iris
>>> iris = load_iris()
>>> from sklearn.svm import LinearSVC
>>> clf = LinearSVC().fit(iris.data, iris.target)

Now let's dump this to a pseudo-file:

>>> from io import BytesIO
>>> outfile = BytesIO()
>>> np.savetxt(outfile, clf.coef_)
>>> print(outfile.getvalue())
1.842426121444650788e-01 4.512319840786759295e-01 -8.079381916413134190e-01 -4.507115611351246720e-01
5.201335313639676022e-02 -8.941985347763323766e-01 4.052446671573840531e-01 -9.380586070674181709e-01
-8.506908158338851722e-01 -9.867329247779884627e-01 1.380997337625912147e+00 1.865393234038096981e+00

That's something you can parse from Java, right?

Now to get a score for the k'th class on a sample x, you need to evaluate

np.dot(x, clf.coef_[k]) + clf.intercept_[k]
# ==
(sum(x[i] * clf.coef_[k, i] for i in xrange(clf.coef_.shape[1]))
 + clf.intercept_[k])

which is also doable, I hope. The class with the highest score wins.

For kernel SVMs, the situation is more complicated because you need to replicate the one-vs-one decision function, as well as the kernels, in the Java code. The SVM model is stored on SVC objects in the attributes support_vectors_ and dual_coef_.

Community
  • 1
  • 1
Fred Foo
  • 355,277
  • 75
  • 744
  • 836
  • I up-voted but this does not really answer my question :) – nopper May 07 '14 at 08:59
  • @nopper You mean you want a library? Then this question is off-topic. – Fred Foo May 07 '14 at 10:53
  • The question is: I need to export a set of SVM classifier (sklearn) to Java with minimal effort. What libraries (java, possibly in maven) do you suggest? – nopper May 07 '14 at 15:11
  • Has anyone done the above with svms and made the effort public? Can't find anything on the web. – Cartesian Theater Nov 05 '15 at 00:57
  • @CartesianTheater I maintain a project which ports learned models to a low level programming language like C, Java or JavaScript. But it's under active development. Anyway, you find it here: [https://github.com/nok/sklearn-porter](https://github.com/nok/sklearn-porter) – Darius Aug 28 '16 at 21:28
  • @DariusMorawiec Thanks, I'll check it out. – Cartesian Theater Aug 29 '16 at 01:46
1

I don't know how to export SVM models in one framework and import them in another, but it could be helpful to understand which parameters describe your model - these are the support vectors selected by the SVM training mechanism, plus the kernel and (some) hyperparameters. I would save those into a file and then pickup any machine learning library in your target language and see if I could initialize SVM classifiers by feeding them with those parameters, instead of training them again.

Nicolas78
  • 5,124
  • 1
  • 23
  • 41
  • I guess that for linear classifiers should be easy since they expose just a set of coefficients, but probably it will be more complicated for non-linear ones. Anyway the question was about the best Java library for this scope that also is published in maven? I can afford to retrain the entire set classifiers if the results are going to be the same. – nopper May 02 '14 at 08:55
  • Yea I don't know that, was just trying to suggest that in your search for a library, "import" function might not be a crucial factor (also SO is a bit antagonistic to "help me choose a library" questions - see that close vote (not mine) - and I wanted to highlight there's a more inteeresting aspect to your question) – Nicolas78 May 02 '14 at 08:57
  • Yes I understood your point. Indeed before creating this specific question I looked for similar question. At the end I find out that there are people precisely looking for exporting sklearn classifiers (without retraining) to Java codebase, such as this [one](http://stackoverflow.com/questions/17511968/python-scikit-learn-exporting-trained-classifier). Therefore I decided to loosen the requirements and accept the possibility of completely retraining my models ;) – nopper May 02 '14 at 09:04