0

During my workflow I often make a copy of main data.frame/data.table that I do some aspects of the work on and then some other on the other copy, before joining them or something later on. However, I often experience that these copies are still connected to each other. So that edits done on one are also done on the other.Unfortunately I am not able to replicate it, but copy-pasting from my console it looks something like this:

# 'used3' is a copy of 'used' with some altercations to it 
c("nLocs","nDays") %in% names(used)
[1] FALSE FALSE
> used3[, nDays :=uniqueN(yDay),c("ID","Year","Season")]
> used3[, nLocs :=.N,c("ID","Year","Season")]
> c("nLocs","nDays") %in% names(used)
[1] TRUE TRUE

So that alterations done on the copy are allso done on the original. Is this a bug? Am I calling them too similar names...or what?

R-version: 3.3 data.table version: 1.9.6

But also experienced in older versions of both R and data.table

ego_
  • 1,409
  • 6
  • 21
  • 31
  • Look at `?data.table::copy` – jbaums Aug 09 '16 at 07:49
  • Thanks! That was one of those "How did I miss that?!". Should you add it as an answer so I can accept it? – ego_ Aug 09 '16 at 07:56
  • See also [this](http://stackoverflow.com/questions/10225098/understanding-exactly-when-a-data-table-is-a-reference-to-vs-a-copy-of-another) and the linked questions. Also [this](http://stackoverflow.com/questions/15913417/why-does-data-table-update-namesdt-by-reference-even-if-i-assign-to-another-v/15913648#15913648) – David Arenburg Aug 09 '16 at 08:24

1 Answers1

1

You shouldn't see this behaviour with data.frame, but you will see it for data.table objects.

?data.table::copy explains that data.tables prevent creating copies wherever possible, and the result is that after modifying a data.table with set* or := operators, such as:

library(data.table)
A <- data.table(x=1:10)
B <- A
A[, y:=10:1]

B
##      x  y
##  1:  1 10
##  2:  2  9
##  3:  3  8
##  4:  4  7
##  5:  5  6
##  6:  6  5
##  7:  7  4
##  8:  8  3
##  9:  9  2
## 10: 10  1

A and B are still identical (i.e. element y was added to both).

The bottom line is that to make a copy of a data.table, you can instead do:

A <- data.table(x=1:10)
B <- copy(A)
A[, y:=10:1]

B
##      x
##  1:  1
##  2:  2
##  3:  3
##  4:  4
##  5:  5
##  6:  6
##  7:  7
##  8:  8
##  9:  9
## 10: 10

Note that using the $ operator to add an element to a data.table does result in a copy being made:

A <- data.table(x=1:10)
B <- A
A$y <- 10:1

B
##      x
##  1:  1
##  2:  2
##  3:  3
##  4:  4
##  5:  5
##  6:  6
##  7:  7
##  8:  8
##  9:  9
## 10: 10
jbaums
  • 27,115
  • 5
  • 79
  • 119