6

I am trying to join two data.tables in R base don multiple setkeys and which have repeated entries. As an example

>DT1
ID  state Month Day Year
1   IL    Jan   3   2013 
1   IL    Jan   3   2014
1   IL    Jan   3   2014
1   IL    Jan   10  2014
1   IL    Jan   11  2013
1   IL    Jan   30  2013
1   IL    Jan   30  2013
1   IL    Feb   2   2013
1   IL    Feb   2   2014
1   IL    Feb   3   2013
1   IL    Feb   3   2014

>DT2
state Month   Day   Year  Tavg
  IL    Jan    1    2013    13
  IL    Jan    2    2013    19
  IL    Jan    3    2013    22
  IL    Jan    4    2013    23
  IL    Jan    5    2013    26
  IL    Jan    6    2013    24
  IL    Jan    7    2013    27
  IL    Jan    8    2013    32
  IL    Jan    9    2013    36
  ...   ...    ..   ...      ... 
  ...   ...    ..   ...      ... 
  IL    Dec 31  2013    33

I would like to add the "Tavg" values of DT2 to the corresponding dates in DT1 For example, all entries in DT1 that are on Jan 3 2013 need to have Tavg 13 in an additional column.

I tried the following setkey(DT1, state, Month, Day, Year) and same for DT2 followed by a Join operation DT1[DT2, nomatch=0, allow.cartesian=TRUE But it didn't work

Arun
  • 116,683
  • 26
  • 284
  • 387
Gabriel
  • 71
  • 1
  • 1
  • 3
  • 2
    What do you mean by 'it didn't work'? – Arun Mar 16 '15 at 16:54
  • 1
    I'm guessing you want `DT1[DT2, Tavg := i.Tavg, allow = TRUE]`. See [this post](http://stackoverflow.com/questions/19553005/data-table-join-then-add-columns-to-existing-data-frame-without-re-copy/19553281#19553281) for more details. – eddi Mar 16 '15 at 17:18
  • (To: Arun) Although some of the entries in DT1 were duplicates, the result of the join was almost 30 or more folds the original duplicated entires (of DT1) and mismatching Tavg... – Gabriel Mar 16 '15 at 19:26
  • there is a slight hiccup though: – Gabriel Mar 16 '15 at 20:03
  • results are below ID state Month Day Year Tavg 1 IL Feb 1 2013 7 1 IL Feb 2 2013 13 1 IL Feb 2 2013 13 2 IL Feb 1 2013 6 <---- Should have been 7 2 IL Feb 1 2013 6 <----- Should have been 7 – Gabriel Mar 16 '15 at 20:05
  • ID state Month Day Year Tavg 1 IL Feb 1 2013 7 1 IL Feb 2 2013 13 1 IL Feb 2 2013 13 1 IL Feb 3 2013 15 2 IL Feb 1 2013 6 ## should have been 7 2 IL Feb 1 2013 6 ## should have been 7 2 IL Feb 2 2013 14 ## should have been 13 In the answer, below I provided the result of running the code, which was a bit strange.. – Gabriel Mar 16 '15 at 20:56
  • I would just try `merge(DT1, DT2, by = intersect(names(DT1), names(DT2)))` – Qi Yin Feb 15 '21 at 15:27

1 Answers1

9

Just helped a friend with this (he couldn't find a good Stack Overflow answer) so I figured this question needed a more complete "toy" answer.

Here's a couple of simple data tables with one mismatched key:

dt1 <- data.table(a = LETTERS[1:5],b=letters[1:5],c=1:5)
dt2 <- data.table(c = LETTERS[c(1:4,6)],b=letters[1:5],a=6:10)

And here's several multiple key merge options:

merge(dt1,dt2,by.x=c("a","b"),by.y=c("c","b")) #Inner Join
merge(dt1,dt2,by.x=c("a","b"),by.y=c("c","b"),all=T) #Outer Join

setkey(dt1,a,b)
setkey(dt2,c,b)

dt2[dt1] #Left Join (if dt1 is the "left" table)
dt1[dt2] #Right Join (if dt1 is the "left" table)
D. Woods
  • 3,004
  • 3
  • 29
  • 37