1

Now I have 10GB of data set to train the model in sklearn, but my computer only has 8GB of memory, so I have other ways to go besides incremental classifier.

Flame
  • 11
  • 2

1 Answers1

0

I think sklearn can be used for larger data if the technique is right. If your chosen algorithms support partial_fit or an online learning approach then you're on track. The chunk_size may influence your success

This link may be useful( Working with big data in python and numpy, not enough ram, how to save partial results on the disc?)

Another thing you can do is to randomly pick whether or not to keep a row in your csv file...and save the result to a .npy file so it loads quicker. That way you get a sampling of your data that will allow you to start playing with it with all algorithms...and deal with the bigger data issue along the way(or not at all! sometimes a sample with a good approach is good enough depending on what you want).