3

I have a problem about the assignment of data.table columns in R. My example code is like below:

library(data.table)
DT <- data.table(A=c(3,5,2,6,4), B=c(6,2,7,2,1), Amount=1:5)
setkey(DT, A)
amt <- DT$Amount 
amt #3 1 5 2 4
setkey(DT, B)
amt #5 2 4 1 3

I used the "$" sign to assign the data.table's column to a variable "amt", but looks like after I changed the order of the data.table, the order of "amt" is changed as well. Can anyone tell me why this happens? and how can I avoid this from happening (I dont want the order of "amt" to change when I change the order of DT)?

Thank you very much.

Carter
  • 1,563
  • 8
  • 23
  • 32
  • 7
    My guess: `DT$Amount` is a pointer to the column of the data table, not a copy of the vector. If you want a copy, you can use that function `amt <- copy(DT$Amount)`. You encounter the same issue if you try to make a copy of the DT with `DT2 <- DT` -- again, `copy()` is the answer. – Frank Sep 03 '15 at 01:33
  • @Frank go with the answer, why keeping unsolved question on data.table tag? – jangorecki Sep 03 '15 at 11:05
  • @jangorecki Okay, done. I hope that you guys who know the technical details will edit the answer as needed. I think that linked question covers everything, so maybe it's best to just keep its two answers up-to-date (last edited in January 2013). – Frank Sep 03 '15 at 12:51
  • Please have a look at the *Reference Semantics* vignette [here](https://github.com/Rdatatable/data.table/wiki/Getting-started). – Arun Sep 05 '15 at 10:04

1 Answers1

6

To get around this, you can take a copy of the column:

amt <- copy(DT$Amount)

When assigning amt <- DT$Amount, the result is a "shallow copy," which is simply a pointer to the original column. The same issue comes up when you want to create a copy of a data.table, where best practice is DT2 <- copy(DT).

Note that data.tables -- like data.frames, of which they are a special case -- are each a vector of pointers to columns; and that this copying behavior is inherited from base R. For example:

DF <- data.frame(x=c(1,4,2)); xx <- DF$x; setorder(DF,x); identical(xx,DF$x) # TRUE

The link above is strongly recommended for both technical details and advice on best practices.

Community
  • 1
  • 1
Frank
  • 66,179
  • 8
  • 96
  • 180