R: data.table: := in reverse case

Question

Suppose I have two datasets that I want to left-join,

i <- data.table(id=1:3, k=7:9, l=7:9, m=7:9, n=7:9)

and

x <- data.table(id=c(1,2), x=c(10,20))

To left-join, keeping all lines in i, I execute

x[i, .(id=i.id, k=i.k, l=i.l, m=i.m, n=i.n, x=x.x), on=.(id=id)]

but I wonder whether there is an easier and more efficient way that makes it unnecessary to spell out all the columns from i.

For example, in the reverse case (that is, when I want to keep all columns from i), I could use the := operator, as in x[i, k:=i.k, on=.(id=id)]. My understanding is that this makes things also more efficient because columns do not need to be copied. Is there something comparable for this case?

Does the last part of [my answer here](https://stackoverflow.com/a/37994369/2204410) help? — Jaap, Dec 05 '18 at 11:58
If it means that there is no such thing as a reverse-:=, it helps clarifying (but sadly means I need to find other ways to make it more efficient). Is this so? — bumblebee, Dec 05 '18 at 12:01
Why not `i[x, on = .(id), x := i.x][]`? If I understand your question correctly, then that is what you are looking for. — Jaap, Dec 05 '18 at 12:05
This works (and saves 1/4 in time) but confuses me. My understanding was that the inner DF is the left one in a left-join. The one of which all rows are kept. Now I see that it also works the other way around, but only in `i[x, on = .(id), x := i.x]` , whereas `i[x, on = .(id)]` appears to perform an inner join. Is there any place where this is explained? — bumblebee, Dec 05 '18 at 12:16
[Does this explanation help?](https://stackoverflow.com/a/34600831/2204410) — Jaap, Dec 05 '18 at 12:20
Very helpful to understand the efficiency gain, but leaves me wondering why `i[x, on = .(id)]` does an inner-join, whereas `i[x, on = .(id), x := i.x]` does a left-join. — bumblebee, Dec 05 '18 at 12:33
`i[x, on = .(id), x := i.x]` updates `i` by reference and look in `x` which rows match and thus need to be updated; whereas `i[x, on = .(id)]` also uses `x` but only give back the rows that match (because there isn't a need to update something) — Jaap, Dec 05 '18 at 12:49

score 3 · Answer 1 · answered Dec 05 '18 at 12:03

3

you can use data-tables setcolorder() after the join..

setcolorder( x[i, on = "id"], c( names(i), "x" ) )

#    id k l m n  x
# 1:  1 7 7 7 7 10
# 2:  2 8 8 8 8 20
# 3:  3 9 9 9 9 NA

answered Dec 05 '18 at 12:03

Wimpel

26,031
1
20
37

Thanks, but this was mostly about the efficiency. – bumblebee Dec 05 '18 at 13:02
2

@bumblebee `setcolorder` changes the order of the data.table 'by reference' (same as `:=` does), so it is pretty darn efficient :) – Wimpel Dec 05 '18 at 13:09

score 1 · Answer 2 · answered Dec 05 '18 at 12:05

1

What's wrong with merge?

  y <- merge(i, x, all.x = TRUE, by = "id")

answered Dec 05 '18 at 12:05

Jonny Phelps

2,687
1
11
20

Nothing wrong but my point was about the efficiency of `:='. Thanks, anyway! – bumblebee Dec 05 '18 at 13:01

R: data.table: := in reverse case

2 Answers2