I'm trying to perform sentiment analysis over the twitter dataset "Sentiment140" which consists of 1.6 million labelled tweets . I'm constructing my feature vector using Bag Of Words ( Unigram ) model , so each tweet is represented by about 20000 features . Now to train my sklearn model (SVM,Logistic Regression,Naive Bayes) using this dataset , i have to load the entire 1.6m x 20000 feature vectors into one variable and then feed it to the model . Even on my server machine which has a total of 115GB of memory , it causes the process to be killed .
So i wanted to know if i can train the model instance by instance , rather than loading the entire dataset into one variable ?
If sklearn does not have this flexibility , then is there any other libraries that you could recommend (which support sequential learning) ?