0

I can't imagine that this hasn't been explained somewhere else already, but unfortunately I could not find it. In any case, I think it's useful for many users to better understand data.table

I'm having two data.tables which I want to join using values form the second table overwriting those in the first. Here's a reproducible example:

This is the first table:

tableA = data.table(A = as.character(c("a", "b", "c", "d")), 
                B = as.character(c("A", "B", "C", "D")), 
                C = as.numeric(1:4), 
                D = as.numeric(5:8), 
                E = rep( "*", 4))

setkey(tableA, A, B)

> tableA
   A B C D E
1: a A 1 5 *
2: b B 2 6 *
3: c C 3 7 *
4: d D 4 8 *

tableB = data.table(A = as.character(c("a", "b", "c")), 
                B = as.character(c("A", "B", "C")), 
                C = as.numeric(1:3), 
                D = as.numeric(c(5,6,10)), 
                E = rep( "*", 3))

setkey(tableB, A, B)

> tableB
   A B C  D E
1: a A 1  5 *
2: b B 2  6 *
3: c C 3 10 *

tableA[tableB, names(tableB) := tableB, on = c("A", "B)]

> tableA
   A B C  D E
1: a A 1  5 *
2: b B 2  6 *
3: c C 3 10 *
4: d D 4  8 *

Which is exactly what I want. However, what if I need to keep the original table? Well, I thought it would be easy to assign it to another object and then apply the same procedure:

tableA = data.table(A = as.character(c("a", "b", "c", "d")), 
                B = as.character(c("A", "B", "C", "D")), 
                C = as.numeric(1:4), 
                D = as.numeric(5:8), 
                E = rep( "*", 4))

setkey(tableA, A, B)

tableB = data.table(A = as.character(c("a", "b", "c")), 
                B = as.character(c("A", "B", "C")), 
                C = as.numeric(1:3), 
                D = as.numeric(c(5,6,10)), 
                E = rep( "*", 3))
setkey(tableB, A, B)

tableX = tableA
tableX[tableB, names(tableB) := tableB, on = c("A", "B")]

However, once I have Joined tableX and tableB, tableA has changed as well and is equivalent to tableX without having included it in the join.

> tableA
   A B C  D E
1: a A 1  5 *
2: b B 2  6 *
3: c C 3 10 *
4: d D 4  8 *
> tableX
   A B C  D E
1: a A 1  5 *
2: b B 2  6 *
3: c C 3 10 *
4: d D 4  8 *

Can somebody explain why this is happening? I've been using data.table for a while now but it seems I still haven't gotten my head around it sufficiently.

bernhard
  • 16
  • 3
  • 3
    use `copy` to copy your data.table. see [here](http://stackoverflow.com/questions/10225098/understanding-exactly-when-a-data-table-is-a-reference-to-vs-a-copy-of-another) – Cath Dec 08 '15 at 15:57
  • This is standard behavior for object oriented languages, though unusual for R. basically, tableX is just a reference pointing to the same object as tableA, not a copy of that object. – Matthew Dec 08 '15 at 15:59
  • Thanks, CathG and Matthew. I checked out the copy function which works and made me understand more about how data.table works. I never experienced that kind of behavior using data frames. – bernhard Dec 08 '15 at 16:30

0 Answers0