6

I've noticed some inconsistent (inconsistent to me) behaviour in data.table when using different assignment operators. I have to admit I never quite got the difference between "=" and copy(), so maybe we can shed some light here. If you use "=" or "<-" instead of copy() below, upon changing the copied data.table, the original data.table will change as well.

Please execute the following commands and you will see what I mean

library(data.table)
example(data.table)

DT
   x y  v
1: a 1 42
2: a 3 42
3: a 6 42
4: b 1  4
5: b 3  5
6: b 6  6
7: c 1  7
8: c 3  8
9: c 6  9

DT2 = DT

now i'll change the v column of DT2:

DT2[ ,v:=3L]
   x y  v
1: a 1  3
2: a 3  3
3: a 6  3
4: b 1  3
5: b 3  3
6: b 6  3
7: c 1  3
8: c 3  3
9: c 6  3

but look what happened to DT:

DT
   x y  v
1: a 1  3
2: a 3  3
3: a 6  3
4: b 1  3
5: b 3  3
6: b 6  3
7: c 1  3
8: c 3  3
9: c 6  3

it changed as well. so: changing DT2 changed the original DT. not so if I use copy():

example(data.table)  # reset DT
DT3 <- copy(DT)
DT3[, v:= 3L]
   x y  v
1: a 1  3
2: a 3  3
3: a 6  3
4: b 1  3
5: b 3  3
6: b 6  3
7: c 1  3
8: c 3  3
9: c 6  3

DT
   x y  v
1: a 1 42
2: a 3 42
3: a 6 42
4: b 1  4
5: b 3  5
6: b 6  6
7: c 1  7
8: c 3  8
9: c 6  9

is this behaviour expected?

Florian Oswald
  • 5,054
  • 5
  • 30
  • 38

1 Answers1

11

Yes. This is expected behaviour, and well documented.

Since data.table uses references to the original object to achieve modify-in-place, it is very fast.

For this reason, if you really want to copy the data, you need to use copy(DT)


From the documentation for ?copy:

The data.table is modified by reference, and returned (invisibly) so it can be used in compound statements; e.g., setkey(DT,a)[J("foo")]. If you require a copy, take a copy first (using DT2=copy(DT)). copy() may also sometimes be useful before := is used to subassign to a column by reference. See ?copy.

See also this question : Understanding exactly when a data.table is a reference to vs a copy of another

Community
  • 1
  • 1
Andrie
  • 176,377
  • 47
  • 447
  • 496
  • Thanks @Andrie. I understand assignment by reference, and why to avoid copying in the first place. It just seems strange to me that `=` creates a link between the copy and the original, as if they were the same object (when that's not the case in R otherwise). – Florian Oswald Jun 25 '12 at 15:51
  • @FlorianOswald I agree - this can be a bit of a trap if one is not careful. – Andrie Jun 25 '12 at 15:53
  • 1
    +10 if I could Andrie. @Florian Imagine a 20GB+ table in memory. We absolutely do not want to copy it, even once. But if you really want to, you can. It doesn't break compatibility with other packages, because it's only `:=` and the `set*` functions that assign by reference. It's one of the reasons we introduced a new operator (`:=`), rather than make `<-` work differently. – Matt Dowle Jun 25 '12 at 15:53
  • @MatthewDowle See, I paid attention when you talked at last week's London R user group meeting :-) – Andrie Jun 25 '12 at 15:54
  • @Florian Yes it is the case in R otherwise. `=` doesn't copy. It creates a link between the objects in base, too. It's `<-` and `=` inside a function that copy-on-change. If you use `=` inside a function on `data.table` it would be copied, too. – Matt Dowle Jun 25 '12 at 15:56
  • @Florian Btw, congrats for asking the 200th question tagged `data.table`! – Matt Dowle Jun 25 '12 at 15:57
  • @MatthewDowle good stuff! that is 200 answered question for you? congrats to you rather! by the way, i don't want to keep nagging but I don't understand what you said: `=` doesn't copy? if i do `a=3`, `b=a`, `b=4`, i still get `a=3`. is that completely missing the point? has it to do with that data.table is a function (thus: an environment)? – Florian Oswald Jun 25 '12 at 16:08
  • 3
    @Florian After the `b=a` do `.Internal(inspect(a))` and `.Internal(inspect(b))`. See the same hex address? `b` is a mere pointer to `a` at that point. Then the `b=4` copies-on-change. I added a link to a more detailed answer into Andrie's answer, hope that helps. – Matt Dowle Jun 25 '12 at 16:22
  • 3
    @Florian Thanks to the R gods. The _illusion_ of pass-by-copy is one of the many great things about R. Lazy evaluation and lexical scope are also key to why we like R. – Matt Dowle Jun 25 '12 at 16:36