0

Suppose I have two datasets that I want to left-join,

i <- data.table(id=1:3, k=7:9, l=7:9, m=7:9, n=7:9)

and

x <- data.table(id=c(1,2), x=c(10,20))

To left-join, keeping all lines in i, I execute

x[i, .(id=i.id, k=i.k, l=i.l, m=i.m, n=i.n, x=x.x), on=.(id=id)]

but I wonder whether there is an easier and more efficient way that makes it unnecessary to spell out all the columns from i.

For example, in the reverse case (that is, when I want to keep all columns from i), I could use the := operator, as in x[i, k:=i.k, on=.(id=id)]. My understanding is that this makes things also more efficient because columns do not need to be copied. Is there something comparable for this case?

bumblebee
  • 1,116
  • 8
  • 20
  • 1
    Does the last part of [my answer here](https://stackoverflow.com/a/37994369/2204410) help? – Jaap Dec 05 '18 at 11:58
  • If it means that there is no such thing as a reverse-:=, it helps clarifying (but sadly means I need to find other ways to make it more efficient). Is this so? – bumblebee Dec 05 '18 at 12:01
  • Why not `i[x, on = .(id), x := i.x][]`? If I understand your question correctly, then that is what you are looking for. – Jaap Dec 05 '18 at 12:05
  • This works (and saves 1/4 in time) but confuses me. My understanding was that the inner DF is the left one in a left-join. The one of which all rows are kept. Now I see that it also works the other way around, but only in `i[x, on = .(id), x := i.x]` , whereas `i[x, on = .(id)]` appears to perform an inner join. Is there any place where this is explained? – bumblebee Dec 05 '18 at 12:16
  • 1
    [Does this explanation help?](https://stackoverflow.com/a/34600831/2204410) – Jaap Dec 05 '18 at 12:20
  • Very helpful to understand the efficiency gain, but leaves me wondering why `i[x, on = .(id)]` does an inner-join, whereas `i[x, on = .(id), x := i.x]` does a left-join. – bumblebee Dec 05 '18 at 12:33
  • 1
    `i[x, on = .(id), x := i.x]` updates `i` by reference and look in `x` which rows match and thus need to be updated; whereas `i[x, on = .(id)]` also uses `x` but only give back the rows that match (because there isn't a need to update something) – Jaap Dec 05 '18 at 12:49
  • This helps to know, thanks! – bumblebee Dec 05 '18 at 13:01

2 Answers2

3

you can use data-tables setcolorder() after the join..

setcolorder( x[i, on = "id"], c( names(i), "x" ) )

#    id k l m n  x
# 1:  1 7 7 7 7 10
# 2:  2 8 8 8 8 20
# 3:  3 9 9 9 9 NA
Wimpel
  • 26,031
  • 1
  • 20
  • 37
1

What's wrong with merge?

  y <- merge(i, x, all.x = TRUE, by = "id")
Jonny Phelps
  • 2,687
  • 1
  • 11
  • 20