Speedup scikit-learn classifiers - multicore support?

Question

The title says it all - is there a way to let scikit-learn classifiers like SVC use multiple cores? I am currently working on images with dimensions (1280,1024) and even when I scale them down to (200,xxx), I have to wait about 2 minutes for the result. Since I have 24 cores at my disposale, it is a bit tiresome to watch the cpu work at 4-5% because the process runs only single-core.

Hi, first of all try to look here: http://scikit-learn.org/stable/modules/computational_performance.html#linear-algebra-libraries Then, run your code like this: ```$ OMP_NUM_THREADS=8 python .py```. — 404pio, Feb 17 '16 at 17:02
And at the end look here: http://stackoverflow.com/a/9002656/1615070 — 404pio, Feb 17 '16 at 17:08
I will have a look at this. But since it says "For instance models based on (randomized) decision trees typically do not rely on BLAS calls in their inner loops, nor do kernel SVMs (SVC, SVR, NuSVC, NuSVR). " on the webiste, I have little hope that it will help with SVC. — user3696412, Feb 17 '16 at 21:53
OMP_NUM_THREADS was already 4 by default, but still just one thread is used. — user3696412, Feb 19 '16 at 08:14

Farseer · Answer 1 · 2016-02-17T15:06:00.840

1

You can add kernel cache size: the size of the kernel cache has a strong impact on run times for larger problems. If you have enough RAM available, it is recommended to set cache_size to a higher value than the default of 200(MB), such as 500(MB) or 1000(MB). link

edited Feb 17 '16 at 15:06

answered Feb 17 '16 at 14:57

Farseer

4,036
3
42
61

1

I think I saw the exact same phrase somewhere on the scikit-learn website. Perhaps you could be so kind and add the link to your answer to give them proper credit? – Mathias Müller Feb 17 '16 at 15:04
1

Thanks, that helps somewhat, at least on the first look. One has to be careful though. I first tried with 10GB of cache and doubled the time. WIth 1GB of cache, it went down about 70%. Well, I think this will be as good as it gets, since the SCV info page also says `The implementation is based on libsvm. The fit time complexity is more than quadratic with the number of samples which makes it hard to scale to dataset with more than a couple of 10000 samples.` – user3696412 Feb 17 '16 at 15:04
If you planning to use linear kernel, it will be much better to use LinearSVC. Train process for this model is way faster. – Farseer Feb 17 '16 at 15:08
1

Short test have shown very bad performance with LinearSVC (not speed-wise, but result-wise), so using linear kernel or linearSVC are not planned at this time. – user3696412 Feb 17 '16 at 21:55

Speedup scikit-learn classifiers - multicore support?

1 Answers1