Why is R data.table adding columns to a another data table that I did not reference?

Question

I have an original data table dt1 that contains column x. I want to create another data table called dt2 that contains x and the first lag of x. When I execute the following code, I obtain dt2 as desired, but I find that dt1 has also become dt2--I don't want this to happen.

library(data.table);

x <- rnorm(100, 0, 1);
dt1 <- data.table(x);

dt2 <- dt1;
dt2[, lx:= shift(x, 1, type= "lag")];

identical(dt1, dt2); # evaluates to TRUE

Am I missing something fundamental about how data table works? Any help will be appreciated.

score 5 · Accepted Answer · answered May 01 '18 at 16:28

5

Yes, data.table changes its values by reference. If you'd like to retain a copy of the original, you should use copy:

library(data.table)         
DT1 <- data.table(x = 1:100)
DT2 <- DT1                  
identical(DT1, DT2)         
#> [1] TRUE
DT1[, y := x + 1]           
identical(DT1, DT2)         
#> [1] TRUE
DT2 <- copy(DT1)            
DT2[, y := x + 2]           
identical(DT1, DT2)         
#> [1] FALSE

answered May 01 '18 at 16:28

Hugh

15,521
12
57
100

1

I spent hours getting confused and angry about this--thank you @Hugh! – txinferno May 01 '18 at 16:31

Why is R data.table adding columns to a another data table that I did not reference?

1 Answers1