I want to build a tf-idf model based on a corpus that cannot fit in memory. I read the tutorial but the corpus seems to be loaded at once:
from sklearn.feature_extraction.text import TfidfVectorizer
corpus = ["doc1", "doc2", "doc3"]
vectorizer = TfidfVectorizer(min_df=1)
vectorizer.fit(corpus)
I wonder if I can load the documents into memory one by one instead of loading all of them.