7

My problem resides in simple calculations over big data sets (around 25 millions rows and 10 columns, i.e. aroung 1GB data). My system is:

32bits/Windows7/4Gb Ram/R Studio 0.96, R 2.15.2

I can refer my database using BigMemory package. And use functions over my db. Also i am able to do it with ff package, filehash, etc.

The problem is while computing simple calculations (as unique values, means, etc.) i have the typical problem of

"cannot allocate vector size n mb"

, where n can be as small as 70mb - 95 mb, etc.

I know about all (i think) the solutions provided until now about this:

increase RAM.
launch R with inline code "--max-mem-size XXXX",  
use memory.limit() and memory-size() commands, 
use rm() and gc(), 
work on 64bit, 
close other programs, free memory, reboot, 
use packages bigmemory, ff, filehash, sql, etc etc.
improve your data, use integers, shorts, etc. ...  
check memory usage of intermediate calculations, ... 
etc. 

All of this is tested, done,(except moving to another system/machine, obiously) etc.

But I still get those "cannot allocate vector size n mb", where n is around 90mb for example, with really almost no memory usage from R or other programs, all of it rebooted, fresh.... I am aware of the difference between free memory and allocation from windows and R, etc, but,

It make no sense, because the memory avaiable is more than 3GB. I suspect the cause is something really under windows32b -- R memory managment, but it seems almost a joke to buy 4GB of RAM or switch all the system to 64bits, to allocate 70mb.

Is there something I am missing?

Ferran Buireu
  • 28,630
  • 6
  • 39
  • 67
Miguel Vazq
  • 1,459
  • 2
  • 15
  • 21
  • 2
    I think that message indicates that `n` is the size of memory that is too much, not the total amount of memory the object requires. – Sacha Epskamp Nov 12 '12 at 12:00
  • 2
    Stata has the exact same problem. It's an issue with Windows, not the particular statistical program. Perhaps if you posted your code we could suggest a different way of going about it? – Ari B. Friedman Nov 12 '12 at 12:10
  • 1
    i don't know how comfortable you are with linux, but maybe you will fix this problem with linux – Gago-Silva Nov 12 '12 at 12:27
  • As suggested, please include the operations (the code!) that produce the memory errors for you, making the example as reproducible as possible. – BenBarnes Nov 12 '12 at 14:58
  • 1
    Hi all, thank you for the answers. Examples for the computations are many, but some could be: `num_w <- length(unique(data[, "id"]) ); num_w; ` , to obtain unique ids... other would be mean, etc. But I am not looking to solve detailed coding, but more about knowing memory managment in windows and R. – Miguel Vazq Nov 12 '12 at 15:07
  • 1
    @MiguelVazq, The point of providing code is that some functions that work well with "normal" `data.frame`s have been rewritten for other object types such as `big.matrix`es or `data.table`s to avoid some of the memory overhead otherwise associated with the base R functions. – BenBarnes Nov 12 '12 at 16:46

2 Answers2

4

The problem is that R tries to allocate 90mb of continuous space. Unfortunately, after many operations, it is possible that the memory is too fragmented.

If possible, try to optimize your code to use small chunks of data at a time.

If you're trying to perform simple calculations like the ones you mentioned (eg. means, max of row, etc), you might try to use biganalytics, which allows you to do a number of operations on big.matrix objects.

Otherwise, as far as I know, short of switching to 64-bit OS and 64-bit R there's not much to do.

J4y
  • 639
  • 6
  • 21
  • Just out of curiosity, what operation exactly are you trying to perform? – J4y Nov 12 '12 at 12:05
  • Hi, thank you for the answer. I am trying to make many operations: unique values, means, extract values by queries, freq distributions, etc. I am aware of biganalytics and memory fragmentations, but after fresh reboot, etc. not being able to allocate 100mb is really strange, memory should not be fragmented after 0 operations. I could use small chunks, but seems going to the "difficult way" before having a simpler solution... Thank you anyway... – Miguel Vazq Nov 12 '12 at 12:50
1

look at the ff package in CRAN. It "tricks" R by allocating data to a memory slot on a fixed file instead of using RAM. It works rather well with importing data. You can also use the ffbase package to perform simple, efficient calculations on the ff objects.

larrydag
  • 1,183
  • 1
  • 8
  • 4