0

I have the following problem and I really don't know how to solve it.

I am trying two join two data.tables - the first one containing more than 8000 rows and the second one with more than 10000 rows. For reasons of simplicity take the first DT to be like:

DT1 <- data.table(ticker=LETTERS[rep(seq(1,4), each=3)], 
          date=seq(as.Date("2015-03-08"), by= "day", length.out = 3),
          PX_LAST = c(10,12,14,8,4,6,7,5,7,6,3,5), ticker2=rep(c("GLEN LN", "JPEE IA", "BEE GR", "ISA GG"), each=3))

and the second DT as:

DT2 <- data.table(ticker=rep(c("GLEN LN", "JPEE IA", "BEE GR", "ISA GG"), each=5), 
          date=seq(as.Date("2015-03-07"), by= "day", length.out = 5),
          Z_SPRD=c(66,76,68,70,69,112,117,125,122,114,22,25,20,21,24,323,270,290,333,299), 
          BID=c(6.2,7.2,8.5,7.9,6.9,11.2,11.7,12.5,12.2,11.4,2.5,2.5,2.4,2.1,2.4,32.1,27.4,29.3,33.8,29.2))

What I want know is to join on the keys "date" and "ticker2 = ticker".

The final output should look like:

DT3 <- data.table(ticker=LETTERS[rep(seq(1,4), each=3)], 
          date=seq(as.Date("2015-03-08"), by= "day", length.out = 3),
          PX_LAST = c(10,12,14,8,4,6,7,5,7,6,3,5), ticker2=rep(c("GLEN LN", "JPEE IA", "BEE GR", "ISA GG"), each=3),
          Z_SPRD=c(76,68,70,117,125,122,25,20,21,270,290,333), 
          BID=c(7.2,8.5,7.9,11.7,12.5,12.2,2.5,2.4,2.1,27.4,29.3,33.8))

    ticker       date PX_LAST ticker2 Z_SPRD  BID
 1:      A 2015-03-08      10 GLEN LN     76  7.2
 2:      A 2015-03-09      12 GLEN LN     68  8.5
 3:      A 2015-03-10      14 GLEN LN     70  7.9
 4:      B 2015-03-08       8 JPEE IA    117 11.7
 5:      B 2015-03-09       4 JPEE IA    125 12.5
 6:      B 2015-03-10       6 JPEE IA    122 12.2
 7:      C 2015-03-08       7  BEE GR     25  2.5
 8:      C 2015-03-09       5  BEE GR     20  2.4
 9:      C 2015-03-10       7  BEE GR     21  2.1
10:      D 2015-03-08       6  ISA GG    270 27.4
11:      D 2015-03-09       3  ISA GG    290 29.3
12:      D 2015-03-10       5  ISA GG    333 33.8

Any help is much appreciated. Perhaps the solution is quite simple but not for me.

Thanks!

Daniel_D
  • 698
  • 1
  • 7
  • 13
  • `merge(DT1, DT2, by.x = c("date", "ticker2"), by.y = c("date", "ticker"))`? – talat Mar 08 '16 at 17:09
  • `DT1[DT2, on = c("date", "ticker2" = "ticker")]` or `DT2[DT1, on = c("date", "ticker" = "ticker2")]`. Where the last one is giving the output of `DT3` – Jaap Mar 08 '16 at 17:10
  • @docendodiscimus The syntax as I used was introduced in 1.9.7. See [point 6 under new features](https://github.com/Rdatatable/data.table). In 1.9.6 I guess it should be `DT2[DT1, on = c("date"="date", "ticker" = "ticker2")]` – Jaap Mar 08 '16 at 17:23
  • As an alternative: `setkey(DT1, date, ticker2)`; `setkey(DT2, date, ticker)`; `DT2[DT1]` – Jaap Mar 08 '16 at 17:28
  • 1
    Same-ish question: http://stackoverflow.com/q/29865512/1191259 – Frank Mar 08 '16 at 17:43
  • Sorry I did not see that there was a similar question. Anyway `DT2[DT1, on = c("date"="date", "ticker" = "ticker2")]` worked out. Sometimes the solution is simple as that. Thank you! – Daniel_D Mar 09 '16 at 08:46

0 Answers0