i'm doing text analisys using R and i created a TermDocumentMatrix using the tm library, obtaining a dtm object whit the follow characteristic:
<<DocumentTermMatrix (documents: 16405, terms: 13002796)>>
Non-/sparse entries: 46650312/213264218068
Sparsity : 100%
Maximal term length: 2179
Weighting : term frequency (tf)
that has a size of 1.5 Gb. Now, i want to obtain the frequency of the words, and to do this i have to trasform the tdm into a matrix, using the command:
freq <- colSums(as.matrix(dtm))
but when i call the function, the program respond with the follow exception:
Error: cannot allocate vector of size 1589.3 Gb
First, why the programm need 1589.3 Gb to store a dtm that has size of 1.5 Gb? Second, how can i solve the problem? Thank to everyone.