Update a data.table based on another data table

Question

I want to update the columns in an old data.table based on a new data.table only when the value is not NA.

DT_old = data.table(x=rep(c("a","b","c")), y=c(1,3,6), v=1:3, l=c(1,1,1))
DT_old
   x y v l
1: a 1 1 1
2: b 3 2 1
3: c 6 3 1
DT_new = data.table(x=rep(c("b","c",'d')), y=c(9,6,10), v=c(2,NA,10), z=c(9,9,9))
DT_new
   x  y  v z
1: b  9  2 9
2: c  6  NA 9
3: d 10 10 9

I want the output to be

   x  y  v z
1: b  9  2 9
2: c  6  3 9
3: d 10 10 9
4: a 1 1 NA

Currently I am merging the two data.table and going through each column and replacing the NA in the new data.table

DT_merged <- merge(DT_new, DT_old, all=TRUE, by='x')
DT_merged
   x y.x v.x  z y.y v.y  l
1: a  NA  NA NA   1   1  1
2: b   9   2  9   3   2  1
3: c   6  NA  9   6   3  1
4: d  10  10  9  NA  NA NA
DT_merged[is.na(y.x), y.x := y.y]
DT_merged[is.na(v.x), v.x := v.y]
DT_merged = DT_merged[, list(y=y.x, v=v.x, z=z)

Is there a better way to do the above?

score 6 · Accepted Answer · edited May 23 '17 at 10:29

Here's how I would approach this. First, I will expand DT_new according to the unique values combination of x columns of both tables using binary join

res <- setkey(DT_new, x)[unique(c(x, DT_old$x))]
res
#    x  y  v  z
# 1: b  9  2  9
# 2: c  6 NA  9
# 3: d 10 10  9
# 4: a NA NA NA

Then, I will updated the two columns in res by reference using another binary join

setkey(res, x)[DT_old, `:=`(y = i.y, v = i.v)]
res
#    x  y  v  z
# 1: a  1  1 NA
# 2: b  3  2  9
# 3: c  6  3  9
# 4: d 10 10  9

Following the comments section, it seems that you are trying to join each column by its own condition. There is no simple way of doing such thing in R or any language AFAIK. Thus, your own solution could be a good option by itself.

Though, here are some other alternatives, mainly taken from a similar question I myself asked not long ago

Using two ifelse statments

setkey(res, x)[DT_old, `:=`(y = ifelse(is.na(y), i.y, y), 
                            v = ifelse(is.na(v), i.v, v))]

Two separate conditional joins

setkey(res, x) ; setkey(DT_old, x) ## old data set needs to be keyed too now
res[is.na(y), y := DT_old[.SD, y]]
res[is.na(v), v := DT_old[.SD, v]]

Both will give you what you need.

P.S.

If you don't want warnings, you need to define the corresponding column classes correctly, e.g. v column in DT_new should be defined as v= c(2L, NA_integer_, 10L)

Many thanks. Can you briefly explain how does `:=`(y = i.y, v = i.v) work? — imsc, Jul 23 '15 at 23:00
It means: assign a certain column from the old data to the new only on the rows you were able to join on. The `i.` distinguish between columms with the same name in both data sets. — David Arenburg, Jul 23 '15 at 23:06
But this update all the values from DT_old. My expected output for the second row is `b 9 2 9`. — imsc, Jul 23 '15 at 23:36

Update a data.table based on another data table

1 Answers1