I am executing the scikit-learn SVM classifier (SVC) from python 2.7.10 and it has been running for over 2 hours now. I have read the data using pandas.read_csv
preprocessed it and then run
clf = SVC(C = 0.001, kernel = 'linear', cache_size = 7000, verbose = True)
clf.fit(X_train, y_train)
I have experience in running classifiers (Random Forests and Deep Neural Networks) in H2O using R and they never take this long! The machine I am running on has 16 GB RAM and and i7 with 3.6 GHz on each core. The taskmonitor tells me that 8.6 Gb RAM are being used by python, however, only 13% of the CPU. I don't quite understand why it is so slow and not even using all resources.
The data I have has 12000000 rows and 22 columns and the only verbose sklearn is giving me is one line:
[LibSVM]
Is that normal behavior or should I see a lot more? Could anyone post the verbose of a svc that finished? Also, can I do anything to speed things up besides lowering the C parameter? Using less rows is not really an option since I want to benchmark algorithms and they wouldn't be comparable if different training data was used. Finally, can anyone explain why so little of my resources are being used?