15

Here is my issue:

df1 <- data.frame(x = 1:5, y = 2:6, z = 3:7)
rownames(df1) <- LETTERS[1:5]
df1
  x y z
A 1 2 3
B 2 3 4
C 3 4 5
D 4 5 6
E 5 6 7

df2 <- data.frame(x = 1:5, y = 2:6, z = 3:7)
rownames(df2) <- LETTERS[3:7]
df2
  x y z
C 1 2 3
D 2 3 4
E 3 4 5
F 4 5 6
G 5 6 7

what I wanted is:

  x y z
A 1 2 3
B 2 3 4
C 4 6 8
D 6 8 10
E 8 10 12
F 4 5 6
G 5 6 7

where duplicated rows were added up by same variable.

alistaire
  • 42,459
  • 4
  • 77
  • 117
David Z
  • 6,641
  • 11
  • 50
  • 101
  • If the row names were represented as a different attribute in the datasets would that work? – Sevyns Jul 19 '16 at 21:07

5 Answers5

14

A solution with base R:

# create a new variable from the rownames
df1$rn <- rownames(df1)
df2$rn <- rownames(df2)

# bind the two dataframes together by row and aggregate
res <- aggregate(cbind(x,y,z) ~ rn, rbind(df1,df2), sum)
# or (thx to @alistaire for reminding me):
res <- aggregate(. ~ rn, rbind(df1,df2), sum)

# assign the rownames again
rownames(res) <- res$rn

# get rid of the 'rn' column
res <- res[, -1]

which gives:

> res
  x  y  z
A 1  2  3
B 2  3  4
C 4  6  8
D 6  8 10
E 8 10 12
F 4  5  6
G 5  6  7
Jaap
  • 81,064
  • 34
  • 182
  • 193
  • 1
    Curious, I've never used `cbind` in a formula context. `.` makes more sense to me than `cbind(x,y,z)` (I sort of expected it to be `+`), but the latter could actually be quite useful... – alistaire Jul 19 '16 at 21:22
  • 1
    @alistaire `cbind` is especially useful when you don't want to apply the aggregating function to all non-grouping columns (though I have to admit that I forgot about the `.` option ;-) ) – Jaap Jul 19 '16 at 21:28
11

With dplyr,

library(dplyr)

# add rownames as a column in each data.frame and bind rows
bind_rows(df1 %>% add_rownames(), 
          df2 %>% add_rownames()) %>% 
    # evaluate following calls for each value in the rowname column
    group_by(rowname) %>% 
    # add all non-grouping variables
    summarise_all(sum)

## # A tibble: 7 x 4
##   rowname     x     y     z
##     <chr> <int> <int> <int>
## 1       A     1     2     3
## 2       B     2     3     4
## 3       C     4     6     8
## 4       D     6     8    10
## 5       E     8    10    12
## 6       F     4     5     6
## 7       G     5     6     7
alistaire
  • 42,459
  • 4
  • 77
  • 117
4

could also vectorize the operation turning the dfs to matrices:

result_df <- as.data.frame(as.matrix(df1) + as.matrix(df2))
MarKo9
  • 97
  • 4
2

This might need some teaking to get the rownames logic working on a longer example:

dfr <-rbind(df1,df2)
do.call(rbind, lapply( split(dfr, sapply(rownames(dfr),substr,1,1)), colSums))
  x  y  z
A 1  2  3
B 2  3  4
C 4  6  8
D 6  8 10
E 8 10 12
F 4  5  6
G 5  6  7

If the rownames could all be assumed to be alpha characters a gsub solution should be easy.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
1

An alternative is to melt the data and cast it. At first we set the row names to the last column of both data frames thanks to @Jaap

df1$rn <- rownames(df1)
df2$rn <- rownames(df2)

Then we melt the data based on the name

melt(list(df1, df2), id.vars = "rn")

Then we use dcast with mget function which is used to retrieve multiple variables at once.

mydf<- dcast(melt(mget(ls(pattern = "df\\d+")), id.vars = "rn"), 
      rn ~ variable, value.var = "value", fun.aggregate = sum)

rownames(mydf) <- mydf$rn

# get rid of the 'rn' column
mydf <- mydf[, -1]

> mydf
#  x  y  z
#A 1  2  3
#B 2  3  4
#C 4  6  8
#D 6  8 10
#E 8 10 12
#F 4  5  6
#G 5  6  7
Jaap
  • 81,064
  • 34
  • 182
  • 193