1

I have a long list of data.tables. I want to merge (inner join) them all in a single data.table.

With dplyr and purrr I can do:

dt1 <- data.table(cbind(letters[1:10], 1:10))
dt2 <- data.table(cbind(letters[3:12], 3:12))
dt3 <- data.table(cbind(letters[5:15], 5:15))

dtl <- list(dt1, dt2, dt3)

library(dplyr)
library(purrr)
merged <- dtl %>% reduce(inner_join, by='V1')

How can I do this with data.table?

francoiskroll
  • 1,026
  • 3
  • 13
  • 24
  • 2
    FYI, there's an [open issue](https://github.com/Rdatatable/data.table/pull/4370) with proposals for `mergelist` and related functions. The current approach is to use `Reduce(...)`. – A5C1D2H2I1M1N2O1R2T1 Dec 10 '20 at 18:28

1 Answers1

5

We can use join on

library(data.table)
na.omit(Reduce(function(x, y) x[y, on = .(V1)], dtl))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thank you. From what I can test the data.table solution is not significantly faster than my current dplyr + purrr, which is a million times nicer to read... – francoiskroll Dec 11 '20 at 11:59
  • @francoiskroll okay, thank you for the comment. I thought the question was about how to do this in data.table – akrun Dec 11 '20 at 16:39
  • Oh yeah it totally was sorry! I guess I'm just disappointed there isn't a big difference as with everything data.table :) – francoiskroll Dec 11 '20 at 16:49
  • @francoiskroll its okay. In most cases, data.table is efficient. I was not aware that there would be a bigger difference in this case – akrun Dec 11 '20 at 16:50