1

I'm using R to deal with some data that is not huge but big enough to cause problems with the available memory. (I'm using a 32bit system with 3Gb Ram, there is no possibility to use another system.)

I found that the package data.table should be a good way to do memory efficient calculations. Especially this post dealing with joining tables without copying seems to help: data.table join then add columns to existing data.frame without re-copy

When doing some test I found out, that even when using references tables-sizes are increasing quite fast:

#rm(list=ls()); gc();    
library(data.table);

n <- 7000000;
A <- data.table(a=1:n, z="sometext", key="a");
B <- data.table(a=1:n, b=rnorm(n, 1), key="a");

#ACopy<-A[B, .(b=i.b, c=i.b, d=i.b, e=i.b, f=i.b, g=i.b, h=i.b, j=i.b, k=i.b, l=i.b, m=i.b)];

A[B, ':='(b=i.b, c=i.b, d=i.b, e=i.b, f=i.b, g=i.b, h=i.b, j=i.b, k=i.b, l=i.b, m=i.b)]
object.size(A);
  1. When increasing the n in the above example I get a "cannot allocate vector if size ..." Error. I was surprised that this error starts to show up already at a table size of about 600Mb. (I know that not all of the 3Gb can be used, but 1.5Gb should be feasable.) Could anyone explain me why the error shows up at a site of 600Mb already? (Workspace clear and no other (memory expensive) applications running).

  2. ACopy does not use data.tables reference features. Here an object limit of ~600Mb seems reasonable for me since some copying is done here. What surprised me is that a) ACopy is smaller than A and b) that the reference solution results in such a big object, (I expected it to be much smaller because of the reference). As you can see I'm new to this and would be glad if anyone could explain.

Thanks, Michael

Community
  • 1
  • 1
Fabian Gehring
  • 1,133
  • 9
  • 24
  • This help page may be relevant: `?Memory` – Frank Mar 23 '15 at 13:33
  • 1
    most likely related to [#1062](https://github.com/Rdatatable/data.table/issues/1062), which is not yet fixed. – Arun Mar 23 '15 at 20:22
  • Thanks, here the link to the related question on stackoverflow: http://stackoverflow.com/questions/28347305/r-why-adding-1-column-to-data-table-nearly-doubles-peak-memory-used?rq=1 – Fabian Gehring Mar 26 '15 at 14:29

0 Answers0