31

While testing my code, I found out the following: If I assign a data.table DT1 to DT and change DT afterwards, DT1 changes with it. So DT and DT1 seem to be internally linked. Is this intended behavior? Although I'm not a programming expert, this looks wrong to me, and testing it with simple R variables or a data.frame, I couldn't reproduce the behavior. What's happening here?

DF <- data.frame(ID=letters[1:5],
                  value=1:5)
DF1 <- DF
all.equal(DF1, DF)
[1] TRUE
DF[1, "value"] <- DF[1, "value"]*2
all.equal(DF1, DF)
[1] "Component 2: Mean relative difference: 1"

library(data.table)
data.table 1.7.1  For help type: help("data.table")
DT <- data.table(ID=letters[1:5],
                  value=1:5)
DT1 <- DT
all.equal(DT1, DT)
[1] TRUE
DT[, value:=value*2]
     ID value
[1,]  a     2
[2,]  b     4
[3,]  c     6
[4,]  d     8
[5,]  e    10
all.equal(DT1, DT)
[1] TRUE
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
Christoph_J
  • 6,804
  • 8
  • 44
  • 58

1 Answers1

29

This piece of documentation in data.table would help. ? data.table::copy

No value is returned. The data.table is modified by reference. If you require a copy, take a copy first (using DT2=copy(DT)). copy() may also sometimes be useful before := is used to subassign to a column by reference.

Ramnath
  • 54,439
  • 16
  • 125
  • 152