3

If I run below codes

dt<-data.table(col1=c(0,1,2),col2=c("a","b","c"),col3=c("aa","ab","cc"))
setkey(dt,col1)
dt1<-data.table(a=c(1,2))
dt[dt1]

I got below results

   col1 col2 col3
1:    1    b   ab
2:    2    c   cc

However, the expected result is

   a col1 col2 col3
1:      0    a   aa
2: 1    1    b   ab
3: 2    2    c   cc

How to get the expected result?

NewGuyComesIn
  • 189
  • 1
  • 2
  • 14
  • I don't think this is a duplicate, I am reading the post as: "merge data.table on key(s) and keep the columns from both tables?" – zx8754 May 10 '16 at 20:19
  • 1
    It might be related to this FR: https://github.com/Rdatatable/data.table/issues/1615 Sounds like they're working on it – Frank May 10 '16 at 20:21

2 Answers2

5

If you're ok modifying dt:

dt[dt1, a := i.a][]
#   col1 col2 col3  a
#1:    0    a   aa NA
#2:    1    b   ab  1
#3:    2    c   cc  2

If you're not, copy() it first.

eddi
  • 49,088
  • 6
  • 104
  • 155
  • Thanks. This is the answer I am looking for. – NewGuyComesIn May 10 '16 at 20:22
  • @eddi Would you mind adding a little bit of text to explain what "i.a" is doing? I do not see this syntax in `help(":=")`. – lmo May 10 '16 at 20:34
  • @lmo you can find it in `?data.table`: *"Advanced: When i is a data.table, the columns of i can be referred to in j by using the prefix i., e.g., X[Y, .(val, i.val)]. Here val refers to X's column and i.val Y's"* – eddi May 10 '16 at 20:37
  • Thanks. I'll take a closer look there. – lmo May 10 '16 at 20:40
1

Your expected results are not consistent to base R merge.
You expect to get LEFT OUTER JOIN while R base is designed to perform RIGHT OUTER JOIN.
Solution is just to swap tables.

library(data.table)
dt<-data.table(col1=c(0,1,2),col2=c("a","b","c"),col3=c("aa","ab","cc"))
dt1<-data.table(a=c(1,2))
dt1[dt, .(a = x.a, col1, col2, col3), on = c("a"="col1")]
#    a col1 col2 col3
#1: NA    0    a   aa
#2:  1    1    b   ab
#3:  2    2    c   cc

Additionally, you have to explicitly ask for x.a column, also due to R base consistency on that matter.
The x.col notation in j was recently implemented in data.table 1.9.7 version. You can install it by

install.packages("data.table", type = "source",
                 repos = "https://Rdatatable.github.io/data.table")

But if you don't care on updating your dt object then it is more efficient to use @eddi answer which adds column by reference.

jangorecki
  • 16,384
  • 4
  • 79
  • 160