I am using SVC from scikit-learn on a large dataset of 10000x1000 (10000 objects with 1000 features). I already saw in other sources that SVMLIB doesn't scale well beyond ~10000 objects and I indeed observe this:
training time for 10000 objects: 18.9s
training time for 12000 objects: 44.2s
training time for 14000 objects: 92.7s
You can imagine what happens when I trying to 80000. However, what I found very surprising is the fact that the SVM's predict() takes even more time than the training fit():
prediction time for 10000 objects (model was also trained on those objects): 49.0s
prediction time for 12000 objects (model was also trained on those objects): 91.5s
prediction time for 14000 objects (model was also trained on those objects): 141.84s
It is trivial to get prediction to run in linear time (although it might be close to linear here), and usually it is much faster than training. So what is going on here?