For a NLP task, I am creating creating a document term matrix of which the dimensions are 4280 x 90141 with >98% zero's. The dense representation of this matrix requires a lot of memory and thus I would like to create it as a sparse matrix.
In this link they suggest to use Scipy. But as far as I understand, it requires the initialization of the dense matrix, before it makes the sparse conversion. Is there a package/available code that creates a sparse document-term representation without initializing a dense matrix firstly?
I am thinking about something like:
dense_doc_term = []
for doc in corpus:
dense_doc_term.append(Counter(doc))
Would that be a good approach?