43

In the course of vectorizing some simulation code, I've run into a memory issue. I'm using 32 bit R version 2.15.0 (via RStudio version 0.96.122) under Windows XP. My machine has 3.46 GB of RAM.

> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Matrix_1.0-6   lattice_0.20-6 MASS_7.3-18   

loaded via a namespace (and not attached):
[1] grid_2.15.0  tools_2.15.0

Here is a minimal example of the problem:

> memory.limit(3000)
[1] 3000
> rm(list = ls())
> gc()
          used (Mb) gc trigger  (Mb)  max used   (Mb)
Ncells 1069761 28.6    1710298  45.7   1710298   45.7
Vcells  901466  6.9   21692001 165.5 173386187 1322.9
> N <- 894993
> library(MASS)
> sims <- mvrnorm(n = N, mu = rep(0, 11), Sigma = diag(nrow = 11))
> sims <- mvrnorm(n = N + 1, mu = rep(0, 11), Sigma = diag(nrow = 11))
Error: cannot allocate vector of size 75.1 Mb

(In my application the covariance matrix Sigma is not diagonal, but I get the same error either way.)

I've spent the afternoon reading about memory allocation issues in R (including here, here and here). From what I've read, I get the impression that it's not a matter of the available RAM per se, but of the available continuous address space. Still, 75.1Mb seems pretty small to me.

I'd greatly appreciate any thoughts or suggestions that you might have.

zx8754
  • 52,746
  • 12
  • 114
  • 209
inhuretnakht
  • 1,222
  • 1
  • 11
  • 15

4 Answers4

60

I had the same warning using the raster package.

> my_mask[my_mask[] != 1] <- NA
Error: cannot allocate vector of size 5.4 Gb

The solution is really simple and consist in increasing the storage capacity of R, here the code line:

##To know the current storage capacity
> memory.limit()
[1] 8103
## To increase the storage capacity
> memory.limit(size=56000)
[1] 56000    
## I did this to increase my storage capacity to 7GB

Hopefully, this will help you to solve the problem Cheers

Mark White
  • 1,228
  • 2
  • 10
  • 25
juandelsur
  • 793
  • 7
  • 6
34

R has gotten to the point where the OS cannot allocate it another 75.1Mb chunk of RAM. That is the size of memory chunk required to do the next sub-operation. It is not a statement about the amount of contiguous RAM required to complete the entire process. By this point, all your available RAM is exhausted but you need more memory to continue and the OS is unable to make more RAM available to R.

Potential solutions to this are manifold. The obvious one is get hold of a 64-bit machine with more RAM. I forget the details but IIRC on 32-bit Windows, any single process can only use a limited amount of RAM (2GB?) and regardless Windows will retain a chunk of memory for itself, so the RAM available to R will be somewhat less than the 3.4Gb you have. On 64-bit Windows R will be able to use more RAM and the maximum amount of RAM you can fit/install will be increased.

If that is not possible, then consider an alternative approach; perhaps do your simulations in batches with the n per batch much smaller than N. That way you can draw a much smaller number of simulations, do whatever you wanted, collect results, then repeat this process until you have done sufficient simulations. You don't show what N is, but I suspect it is big, so try smaller N a number of times to give you N over-all.

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • In my example above, N is 894993. I was hoping to avoid using loops or some variant of apply but perhaps I can't in this case. – inhuretnakht Jun 06 '12 at 16:07
  • 7
    @user1426701 No, you can't. Why does everyone on [SO] lately want avoid using for loops in R? There is nothing wrong with using them and they are quick, as long as you set up storage for the result first and then fill in that object as you loop. – Gavin Simpson Jun 06 '12 at 16:09
  • In this particular example, when I reduce the number of replications so that this memory issue doesn't arise, the version without loops is nearly 5 times faster. It's not so much a matter of wanting to avoid loops altogether as to go from three nested loops to two. – inhuretnakht Jun 07 '12 at 09:29
  • @user1426701 I suspect that something is off there. A loop should be almost as quick as `lapply()`, for most things. – Gavin Simpson Jun 07 '12 at 11:41
  • Your point is well taken. I suspect it may be a matter of the overhead from repeated function calls rather than the loop itself. For example, I expect calling mvrnorm once to generate all 5000 simulation replications is much faster than calling it 5000 times to generate them individually. This is what I was trying to avoid by vectorizing my innermost loop. – inhuretnakht Jun 07 '12 at 15:33
  • @FrankDiTraglia Yes, 5000 `mvrnorm()` calls will be expensive. If you can't do 5000 draws in a single call, see if you can do two calls to `mvrnorm()` drawing 2500 each. Change 2500 to be whatever you can afford to fit into RAM and do your computations. So you need an extra loop to loop over the extra few `mvrnorm()` calls but the inner loop remains the same; you operate on a matrix of 2500 simulated values rather than the original 5000. – Gavin Simpson Jun 07 '12 at 15:44
  • @Gavin Simpson i have 64 bit OS and 16 GB RAM , even though i have problem using functions , the data is loaded but can not use as vector. getting error as `Error: cannot allocate vector of size 13668.3 Gb` – KRU May 11 '15 at 08:31
  • @KRU The issue is the same; you did something that forced R to request an ridiculous amount of RAM. I suspect you asked it for something silly or you need to be running on a huge cluster to access that amount of RAM. – Gavin Simpson May 12 '15 at 04:19
  • @Gavin its not so naive as you think , i have been using 2 million records to be matched with 5 lakh data set which i have been successful loading but further processing i get that error – KRU May 12 '15 at 05:17
  • @KRU whether you can load the data or not is irrelevant; what you are attempting to do in the "further processing" is what is generating the need for the vast amount of memory. The issue is the same however. You don't have the RAM to do what you want. You need to look into alternative algorithms, such as a chunking version that can work iteratively on smaller chunks of data. – Gavin Simpson May 12 '15 at 05:21
2

gc() can help

saving data as .RData, closing, re-opening R, and loading the RData can help.

see my answer here: https://stackoverflow.com/a/24754706/190791 for more details

Community
  • 1
  • 1
Timothée HENRY
  • 14,294
  • 21
  • 96
  • 136
0

does R stop no matter the N value you use? try to use small values and see if it's the mvrnorm function that is the issue or you could simply loop it on subsets. Insert the gc() function in the loop to free some RAM continuously

Gabriel123
  • 426
  • 5
  • 11