2

after training since it cost a lot of time is there a way for me to continue my training and add samples using nusvc() and nearestneighbor() in scikitlearn?

jude
  • 360
  • 3
  • 12
  • 1
    You can always save your whole classificator using pickle or sklearns tuned [model persistence functions](http://scikit-learn.org/stable/modules/model_persistence.html). This allows you loading the whole object back. Of course you can then train again with warm_starting from given weights. It it possible that there are limitations though if the base-estimator is not ready for this (adding new data stuff). One possible example: your new data contains one more target-label. This is problematic. – sascha Dec 01 '16 at 12:44
  • what if the number of classes are the same is there an easier way to retrain it with less time? – jude Dec 01 '16 at 13:44
  • Read [this for a start](http://stackoverflow.com/questions/23056460/does-the-svm-in-sklearn-support-incremental-online-learning). – sascha Dec 01 '16 at 13:50

1 Answers1

3

For the SVM, you might be able to use the online learning abilities of the SGDClassifier class. To do so, you would need to use the partial_fit() function.

neelshiv
  • 6,125
  • 6
  • 21
  • 35
  • what about for knearestneighbor()? – jude Dec 02 '16 at 01:00
  • I don't see that particular function in the scikit documentation (link below). It doesn't appear that scikit KNN methods have partial fit. How long would it take to retrain KNN on all data? http://scikit-learn.org/stable/modules/classes.html#module-sklearn.neighbors – neelshiv Dec 02 '16 at 14:00
  • 2
    I was wrong. The LSH KNN class does have partial fit. I'd read scikit's documentation on this function and test it out on your data to see if it works well for you. http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.LSHForest.html#sklearn.neighbors.LSHForest – neelshiv Dec 02 '16 at 14:08
  • Thanks, but sir i got to solve first the memory error in my kmeans() when clustering codewords i got memory error.... for k=10000 i have 6gb ram – jude Dec 03 '16 at 01:17
  • But it runs at k=1000. – jude Dec 03 '16 at 01:18
  • Sorry, but I'm not totally sure what the solution for your problem would be at this point. I haven't used this particular function. Good luck! – neelshiv Dec 05 '16 at 14:45
  • can minibatchkmeans().parital_fit accumulate the training of cluster centers? – jude Jan 11 '17 at 16:02