1

I have R running on Windows 10 machine with 16gb of RAM. Task manager shows that 13.7 of that is available when I start running RStudio. Then I load the dataset (which has about 1o million rows) and still have 11.8 gb of free RAM. I then use "lm" to run a regression with 35 independent variables. Task Manager shows that the memory usage rises to 14gb after about 30 seconds and then I get this error

Error: cannot allocate vector of size 4.1 Gb

This seems very strange to me since, first of all, there are much more memory available to R than the said 4,1 Gb. Second, 10 million observations and 35 variables is not even considered big data and I don't understand why R has difficulty dealing with that. SAS would handle that in a second with no problem.

Do you know why I get this error, and what I can do to solve it?

joran
  • 169,992
  • 32
  • 429
  • 468
amir-f
  • 711
  • 1
  • 8
  • 17
  • 6
    Hehe, you underestimate the memory consumption by `lm`. This one: [Why does `lm` run out of memory while matrix multiplication works fine for coefficients?](http://stackoverflow.com/q/10326853/4891738) helps you get a basic idea on what is going on. – Zheyuan Li Sep 27 '16 at 16:45
  • 6
    My side by side comparisons with SAS and R on similar sized regression problems on PC's do not support your false claim that SAS could "do this in a second". However, SAS does its work out-of-memory and you are asking for and in memory task involving multiple copies of 35 columns x 10^7 rows * 10 bytes/numeric matrices to be handled. Buy more memory or fire up an EC2 instance.It will be much cheaper than a SAS license. – IRTFM Sep 27 '16 at 16:51
  • Thanks for the comments, and the link to related topic. I'm gonna try biglm and see how it works. – amir-f Sep 27 '16 at 16:56

0 Answers0