Based on this previous post I build leftOuterJoin
which is a function to update a data.table X
according to an other data.table Y
. The function is defined as follows:
leftOuterJoin <- function(X, Y, onCol) {
.colsY <- names(Y)
X[Y, (.colsY) := mget(paste0("i.", .colsY)), on = onCol]
}
The function works 99% of the time as intended, e.g.:
X <- data.table(id = 1:5, L = letters[1:5])
id L
1: 1 a
2: 2 b
3: 3 c
4: 4 d
5: 5 e
Y <- data.table(id = 3:5, L = c(NA, "g", "h"), N = c(10, NA, 12))
id L N
1: 3 <NA> 10
2: 4 g NA
3: 5 h 12
leftOuterJoin(X, Y, "id")
X
id L N
1: 1 a NA
2: 2 b NA
3: 3 <NA> 10
4: 4 g NA
5: 5 h 12
However, for some reason that is unknown to me, it just stops working with some data tables (I have no reproductible example at hand). There is no error, but the data table is not updated. When I use the debug function, everything seems to be working fine, X is updated, but the real data.table isn't. Now, if I just do it outside the function it works. Maybe it is related to the scope of the function? I am really struggling with this problem.
Spec: R v3.5.1
and data.table v1.11.4
.
EDIT
Based on the comments I figured out that the problem is related to the data.table pointer. You can reproduce the problem with this code:
> save(X, file = "X.RData")
> load("X.RData")
> leftOuterJoin(X, Y, "id")
> X
id L
1: 1 a
2: 2 b
3: 3 <NA>
4: 4 g
5: 5 h
Notice that X
is updated but not the way we want it. However, if we use setDT()
it works properly:
> load("X.RData")
> setDT(X)
> leftOuterJoin(X, Y, "id")
> X
id L N
1: 1 a NA
2: 2 b NA
3: 3 <NA> 10
4: 4 g NA
5: 5 h 12
Is there a way to set up leftOuterJoin()
such that it will not be necessary to run setDT()
every time some data is loaded?