12

Does a memory warning affect my R analysis?

When running a large data analysis script in R I get a warning something like:

In '... ' reached total allocation of ___Mb: see help...

But my script continues without error, just the warning. With other data sets I get an error something like:

Error: cannot allocate vector of size ___Mb:

I know the error breaks my data analysis, but is there anything wrong with just getting the warning? I have not noticed anything missing in my data set but it is very large and I have no good means to check everything. I am at 18000Mb allocated to memory and cannot reasonably allocate more.

GregS
  • 279
  • 1
  • 4
  • 8
  • What platform / OS are you using? Run `Sys.info()["machine"]` and `.Platform$OS.type`. Is it 32bit R under Windows? – Simon O'Hanlon Feb 26 '13 at 23:08
  • It's 64-bit windows 7 enterprise edition (x86-64, windows). The machine has 20Gb of memory but I get the same warnings when I allocate 19000Mb. There is one section in my script where memory is an issue and I use the full 18000Mb (shows up in windows task manager as well). I need to import and combine many text files, run a few calculations, and then rearrange the data before breaking it back into smaller pieces. – GregS Feb 26 '13 at 23:35
  • 2
    You could try allocating all the memory in your system. R can't steal memory that is already in use by the OS. I'm not sure if liberal use of `gc()` during the memory intensive parts of your code might help? – Simon O'Hanlon Feb 26 '13 at 23:50
  • 1
    My main concern is to be sure that the warning is in fact just a warning, and that none of my data was affected. I have gc() inserted superstitiously after every line. One thing I don't quite understand is the .Rdata file that I save to only reaches 2.7Gb at most but while working on it in memory it takes up 18-19 Gb. For now the script seems to complete, just with many warnings. I also cannot seem to use certain functions like "by" to work on the entire data set at once which would have been nice. – GregS Feb 27 '13 at 00:03
  • 2
    R makes a lot of copies internally when it is working on data, and can make many copies of certain objects depending on what you are doing (I guess this is one of the drawbacks of an interpreted language). The [R Internals manual](http://cran.r-project.org/doc/manuals/R-ints.html) makes for some really interesting reading as does the source code (if you can read it - I find it very hard going). I think someone with more experience than me needs to guide you here - I'd worry about dishing out spurious information now. – Simon O'Hanlon Feb 27 '13 at 00:11
  • 1
    The .RData file uses a compressed format, so it will usually be a lot smaller than your data in memory. With 64-bit R on windows, you can actually set memory.limit() to be higher than your physical RAM, and then Windows will use the swap file to store the overflow. This will sometimes work well, especially where an object has been copied: the "old" copy can sit in the swap file when it's not being used. Then again, using swap will sometimes make your code run 100 times slower, depending on what you're doing. You just have to try it and see. – Alexander Hanysz Aug 03 '13 at 01:41

1 Answers1

8

Way back in the R 2.5.1 news I found this reference to memory allocation warnings:

malloc.c has been updated to version 2.8.3. This version has a slightly different allocation strategy, and is likely to work a little better close to address space limits but may give more warnings about reaching the total allocation before successfully allocating.

Based on this note, I hypothesize (without any advanced knowledge of the inner implementation) that the warning is given when the memory allocation call in R (malloc.c) failed an attempt to allocate memory. Multiple attempts are made to allocate memory, possibly using different methods, and possibly with calls to the garbage collector. Only when malloc is fairly certain that the allocation cannot be made will it return an error.

Warnings do not compromise existing R objects. They just inform the user that R is nearing the limits of computer memory.

(I hope a more knowledgeable user can confirm this...)

Blue Magister
  • 13,044
  • 5
  • 38
  • 56
  • 2
    It indeed appears to be the case that R calls the garbage collector when it issues the warning. You can use `gcinfo(TRUE)` to enable logging and carefully allocate memory around the limit, to observe some warnings & GC messages for successful allocations. – Jerzy Mar 31 '16 at 13:09