How to merge and sum two data frames

Question

Here is my issue:

df1 <- data.frame(x = 1:5, y = 2:6, z = 3:7)
rownames(df1) <- LETTERS[1:5]
df1
  x y z
A 1 2 3
B 2 3 4
C 3 4 5
D 4 5 6
E 5 6 7

df2 <- data.frame(x = 1:5, y = 2:6, z = 3:7)
rownames(df2) <- LETTERS[3:7]
df2
  x y z
C 1 2 3
D 2 3 4
E 3 4 5
F 4 5 6
G 5 6 7

what I wanted is:

where duplicated rows were added up by same variable.

If the row names were represented as a different attribute in the datasets would that work? — Sevyns, Jul 19 '16 at 21:07

Jaap · Answer 1 · 2019-09-12T08:01:24.077

14

A solution with base R:

# create a new variable from the rownames
df1$rn <- rownames(df1)
df2$rn <- rownames(df2)

# bind the two dataframes together by row and aggregate
res <- aggregate(cbind(x,y,z) ~ rn, rbind(df1,df2), sum)
# or (thx to @alistaire for reminding me):
res <- aggregate(. ~ rn, rbind(df1,df2), sum)

# assign the rownames again
rownames(res) <- res$rn

# get rid of the 'rn' column
res <- res[, -1]

which gives:

edited Sep 12 '19 at 08:01

answered Jul 19 '16 at 21:16

Jaap

81,064
34
182
193

1

Curious, I've never used `cbind` in a formula context. `.` makes more sense to me than `cbind(x,y,z)` (I sort of expected it to be `+`), but the latter could actually be quite useful... – alistaire Jul 19 '16 at 21:22
1

@alistaire `cbind` is especially useful when you don't want to apply the aggregating function to all non-grouping columns (though I have to admit that I forgot about the `.` option ;-) ) – Jaap Jul 19 '16 at 21:28

alistaire · Accepted Answer · 2016-07-19T21:32:10.597

With dplyr,

library(dplyr)

# add rownames as a column in each data.frame and bind rows
bind_rows(df1 %>% add_rownames(), 
          df2 %>% add_rownames()) %>% 
    # evaluate following calls for each value in the rowname column
    group_by(rowname) %>% 
    # add all non-grouping variables
    summarise_all(sum)

## # A tibble: 7 x 4
##   rowname     x     y     z
##     <chr> <int> <int> <int>
## 1       A     1     2     3
## 2       B     2     3     4
## 3       C     4     6     8
## 4       D     6     8    10
## 5       E     8    10    12
## 6       F     4     5     6
## 7       G     5     6     7

score 4 · Answer 3 · answered Nov 20 '18 at 05:01

4

could also vectorize the operation turning the dfs to matrices:

result_df <- as.data.frame(as.matrix(df1) + as.matrix(df2))

answered Nov 20 '18 at 05:01

MarKo9

97
4

score 2 · Answer 4 · answered Jul 19 '16 at 22:50

This might need some teaking to get the rownames logic working on a longer example:

dfr <-rbind(df1,df2)
do.call(rbind, lapply( split(dfr, sapply(rownames(dfr),substr,1,1)), colSums))
  x  y  z
A 1  2  3
B 2  3  4
C 4  6  8
D 6  8 10
E 8 10 12
F 4  5  6
G 5  6  7

If the rownames could all be assumed to be alpha characters a gsub solution should be easy.

score 1 · Answer 5 · edited Sep 12 '19 at 08:02

An alternative is to melt the data and cast it. At first we set the row names to the last column of both data frames thanks to @Jaap

df1$rn <- rownames(df1)
df2$rn <- rownames(df2)

Then we melt the data based on the name

melt(list(df1, df2), id.vars = "rn")

Then we use dcast with mget function which is used to retrieve multiple variables at once.

mydf<- dcast(melt(mget(ls(pattern = "df\\d+")), id.vars = "rn"), 
      rn ~ variable, value.var = "value", fun.aggregate = sum)

rownames(mydf) <- mydf$rn

# get rid of the 'rn' column
mydf <- mydf[, -1]

> mydf
#  x  y  z
#A 1  2  3
#B 2  3  4
#C 4  6  8
#D 6  8 10
#E 8 10 12
#F 4  5  6
#G 5  6  7

How to merge and sum two data frames

5 Answers5

Linked

Related