3

I have two data.tables in R which have an identical set of columns. They contain some rows with overlapping keys, and each table contains some unique rows. I would like to sum the overlapping rows, and retain all of the unique rows from each table.

data.table.a
Key Total
  1     2
  3     1
  4     3
  5     1

data.table.b
Key Total
  2     5
  3     1
  5     2

result
Key Total
  1     2
  2     5
  3     2
  4     3
  5     3

How can I achieve this? data.table.a[data.table.b] performs only a left join and data.table.a + data.table.b will not work as I have differing numbers of rows.

Edit: Added reproducible code:

data.table.a <- data.table(Key=c(1,3,4,5),Total=c(2,1,3,1))
data.table.b <- data.table(Key=c(2,3,5),Total=c(5,1,2))
result <- data.table(Key=c(1,2,3,4,5),Total=c(2,5,2,3,3))
Matt Dowle
  • 58,872
  • 22
  • 166
  • 224
Ina
  • 4,400
  • 6
  • 30
  • 44
  • Duplicate Question: http://stackoverflow.com/questions/5769320/how-to-merge-two-data-frames-on-common-columns-in-r-with-sum-of-others – Subs May 16 '12 at 13:11
  • 3
    @Subs The OP looks for a solution using the specific R package `data.table` and this is not really answered in the link you provide. To be fair though, the question is not really clear on that. For someone not aware of the `data.table` package, it just seems the OP wants to merge to data.frames named data.tables. Things that happen without a minimal reproducible example... – Christoph_J May 16 '12 at 13:15
  • @subs That question just needs `all.x`. This question needs `all`. So not strictly a duplicate? And that question was about `data.frame` where this is about `data.table`, where there might be a different way. – Matt Dowle May 16 '12 at 13:18
  • Correct, I want `all` rather than just `all.x`, and I also want summing. I'm happy to accept a `data.frame` answer if there's no `data.table` answer, but I imagine there is something I just can't find which would allow me to do it in `data.table`. I've added some code to clarify things either way. – Ina May 16 '12 at 13:21

1 Answers1

4

One way is :

> rbind(data.table.a,data.table.b)[,list(Total=sum(Total)),keyby=Key]
     Key Total
[1,]   1     2
[2,]   2     5
[3,]   3     2
[4,]   4     3
[5,]   5     3
Matt Dowle
  • 58,872
  • 22
  • 166
  • 224
  • 2
    @Ina No problem. I've raised [FR#2010](https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2010&group_id=240&atid=978) to make `DT1+DT2` do this using their keys (nice idea btw), and [Bug#2011](https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2011&group_id=240&atid=975) with `merge(all=TRUE)`. – Matt Dowle May 16 '12 at 14:17