I can't imagine that this hasn't been explained somewhere else already, but unfortunately I could not find it. In any case, I think it's useful for many users to better understand data.table
I'm having two data.tables which I want to join using values form the second table overwriting those in the first. Here's a reproducible example:
This is the first table:
tableA = data.table(A = as.character(c("a", "b", "c", "d")),
B = as.character(c("A", "B", "C", "D")),
C = as.numeric(1:4),
D = as.numeric(5:8),
E = rep( "*", 4))
setkey(tableA, A, B)
> tableA
A B C D E
1: a A 1 5 *
2: b B 2 6 *
3: c C 3 7 *
4: d D 4 8 *
tableB = data.table(A = as.character(c("a", "b", "c")),
B = as.character(c("A", "B", "C")),
C = as.numeric(1:3),
D = as.numeric(c(5,6,10)),
E = rep( "*", 3))
setkey(tableB, A, B)
> tableB
A B C D E
1: a A 1 5 *
2: b B 2 6 *
3: c C 3 10 *
tableA[tableB, names(tableB) := tableB, on = c("A", "B)]
> tableA
A B C D E
1: a A 1 5 *
2: b B 2 6 *
3: c C 3 10 *
4: d D 4 8 *
Which is exactly what I want. However, what if I need to keep the original table? Well, I thought it would be easy to assign it to another object and then apply the same procedure:
tableA = data.table(A = as.character(c("a", "b", "c", "d")),
B = as.character(c("A", "B", "C", "D")),
C = as.numeric(1:4),
D = as.numeric(5:8),
E = rep( "*", 4))
setkey(tableA, A, B)
tableB = data.table(A = as.character(c("a", "b", "c")),
B = as.character(c("A", "B", "C")),
C = as.numeric(1:3),
D = as.numeric(c(5,6,10)),
E = rep( "*", 3))
setkey(tableB, A, B)
tableX = tableA
tableX[tableB, names(tableB) := tableB, on = c("A", "B")]
However, once I have Joined tableX
and tableB
, tableA
has changed as well and is equivalent to tableX
without having included it in the join.
> tableA
A B C D E
1: a A 1 5 *
2: b B 2 6 *
3: c C 3 10 *
4: d D 4 8 *
> tableX
A B C D E
1: a A 1 5 *
2: b B 2 6 *
3: c C 3 10 *
4: d D 4 8 *
Can somebody explain why this is happening? I've been using data.table for a while now but it seems I still haven't gotten my head around it sufficiently.