1

I'm getting the following memory allocation error when trying to run DocumentTermMatrix from the tm package. Not sure why this is happening as my machine has 128 gigs of memory and the corpus is just 3 gigs.

Error in mcfork() :
  unable to fork, possible reason: Cannot allocate memory
Calls: DocumentTermMatrix ... content.VCorpus -> materialize -> mclapply -> lapply -> FUN -> mcfork

This is all that is being called:

library(tm)
text <- read.csv('/path/to/text.csv', ...)

vct <- VCorpus(VectorSournce(text[,2]))
vct <- tm_map(vct, removeWords, stopwords("english"), mc.cores=1)
dtm <- DocumentTermMatrix(vct)
Luke
  • 6,699
  • 13
  • 50
  • 88

1 Answers1

1

From this post, I figured out how to fix this by limiting the number of cores used. Since there is no explicit option via DocumentTermMatrix, I had to do it via options:

num.cores <- getOption("mc.cores")
options(mc.cores=1)
dtm <- DocumentTermMatrix(vct)
options(mc.cores=num.cores)
Community
  • 1
  • 1
Luke
  • 6,699
  • 13
  • 50
  • 88