-2

I am running a program which generates a few columns of several million rows, cbinds them, then prints. I'm trying to make the process more memory efficient, and wondering if the following copies data, or just points.

x<-rnorm(3,1,1)
y<-rnorm(3,2,2)
z<-rnorm(3,3,3)
M<-cbind(x,y,z)

One of these answers Understanding exactly when a data.table is a reference to (vs a copy of) another data.table hints that the data is not copied, but the command .Internal(inspect(M)) seems to disagree.

A simple memory solution would be to declare M before running the fi and declare the values into M. I've heard that data.tables can very efficiently hold large data sets. Is there some way to use one in this situation?

Community
  • 1
  • 1
  • 6
    Please make a reproducible example. – Frank May 20 '16 at 16:48
  • 2
    Are you using data.tables? Or are these standard R vectors? matrices? data.frames? It would help if you provided some sort of [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) that could be used for testing. – MrFlick May 20 '16 at 16:48
  • 3
    For what it's worth, data.table works fine if x and y are data.tables `x = data.table(a = 1); y = data.table(b = 2); m = setDT(c(x,y))` Try `sapply(x, address); sapply(y, address); sapply(m, address)` and you'll see the columns share addresses. I'm pretty sure it'll work fine for data.frames as well. – Frank May 20 '16 at 16:51
  • The x,y and z are vectors. As stand ins, we can say x<-rnorm(5,1,1); y<-rnorm(5,2,2); z<-rnorm(5,3,3) The actual functions I don't think are important. – Mathematastic May 20 '16 at 16:54
  • In comments, you'll want to separate commands with `;` so we can copy-paste 'em. In this case, I think your comment should be edited into the question, though. – Frank May 20 '16 at 16:55
  • 2
    My understanding is that any type of join, merge, or bind involves copying. The only actions that don't are `:=` and the family of `set` functions. – lmo May 20 '16 at 16:56
  • I don't think the original `cbind` will necessarily trigger copying; but if you subsequently modify `M` one or more copies may be required. – joran May 20 '16 at 17:02
  • @joran In the OP's example (only in the comments for some reason), x and y are atomic, so cbind will build a matrix (which can't just be done by pointers) – Frank May 20 '16 at 17:03
  • @Imo No. Lazy loading is different. – joran May 20 '16 at 17:05
  • `cbind`ing atomic vectors results to a "matrix" and will involve copying to build the larger structure. You could combine x/y/z in a "list"y ("list", "data.frame\table") structure to, probably, avoid copying. – alexis_laz May 20 '16 at 17:05
  • @Frank If it does copy the objects `x`, `y` and `z` then that copying is not reported by `tracemem`. – joran May 20 '16 at 17:06
  • 1
    The term I think of is "copy on modify" from one of Andrie's questions http://stackoverflow.com/q/15759117/ – Frank May 20 '16 at 17:06
  • @joran Oh. I guess the word "copy" has some subtler meaning here than I realized. I mean like `library(pryr); mem_change(z1 <- list(x,y)); mem_change(z2 <- cbind(x,y))` The former is around zero, while the latter is not because the data in x and y are "duplicated" (can I say that?) to construct z2, while in z1, they are not. – Frank May 20 '16 at 17:16

1 Answers1

2

Here's a way to put vectors in a data.table without copying/by reference (needs R 3.1.1+ iirc).

x = 1:5
y = 5:1

dt = setDT(list(x, y))
#   V1 V2
#1:  1  5
#2:  2  4
#3:  3  3
#4:  4  2
#5:  5  1

dt[3, V1 := 10]
#   V1 V2
#1:  1  5
#2:  2  4
#3: 10  3
#4:  4  2
#5:  5  1

x
#[1]  1  2 10  4  5
eddi
  • 49,088
  • 6
  • 104
  • 155