2

I'm having a bit of trouble understanding why data table updates columns of a different data table.

Please consider the following reproducible code.

library(data.table)
dt <- data.table(a=rep(letters[1:4], 5), 
             b=rep(letters[5:8], 5),
             c=rep(letters[3:6], 5),
             x=sample(1:100, 20),
             y=sample(1:100, 20),
             z=sample(1:100, 20))

Suppose I assign dt to dt.1:

dt.1 <- dt

Next, suppose I update by reference a column in dt.1:

dt.1[, x:= x^2]

The column x, indeed is squared, but the column x in dt is also squared, i.e.

dt[,x :=x^2] is performed in the background.

Why does this happen and how can I prevent this type of updating/dependency from happening?

Thanks

Uwe
  • 41,420
  • 11
  • 90
  • 134
  • 1
    Use `dt.1 <- copy(dt)` to create a physically separate copy. – Uwe Aug 08 '18 at 22:00
  • I got tricked by this a few months ago, couldn't find an answer on SO, and posted my question only to have it marked as a dup as well. I think the "already answered questions" about this need better titles, especially since this behavior is not consistent with data.frames and is especially troublesome and confusing to new users of data.table who (like me) had a hard time knowing what the problem was called in order to search for it. – DanY Aug 08 '18 at 22:36
  • Thanks everyone for the help. Like Dan, I did a lot of googling before writing this post as I could not figure out how to "reference" this issue. But anyways, thanks for all the help! – plausibly_exogenous Aug 08 '18 at 23:37
  • 1
    @DanY and @plausibly_exogenous I agree with the titles of the linked dupe questions being somewhat "convoluted". However, there is the [official `data.table` Vignette on reference semantics](https://cran.r-project.org/web/packages/data.table/vignettes/datatable-reference-semantics.html) that does an excellent job at explaining most of these issues. – Maurits Evers Aug 08 '18 at 23:53

1 Answers1

3

Because of data.tables reference semantics doing

dt.1 <- dt

means that dt.1 and dt refer to the same object in memory. Modifying one modifies the other. In other words, dt.1 is a shallow copy (a copy of pointers) of dt.

You can perform a deep copy by doing

dt.1 <- copy(dt)

This will create a second copy of dt in memory. Any modifications on dt.1 will not affect the original copy dt.

Maurits Evers
  • 49,617
  • 4
  • 47
  • 68