How to combine two data frames using dplyr or other packages?

Question

I have two data frames:

df1 = data.frame(index=c(0,3,4),n1=c(1,2,3))  
df1
#  index n1
# 1    0  1
# 2    3  2
# 3    4  3

df2 = data.frame(index=c(1,2,3),n2=c(4,5,6))  
df2
#   index n2
# 1     1  4
# 2     2  5
# 3     3  6

I want to join these to:

  index n
1     0 1
2     1 4
3     2 5
4     3 8  (index 3 in two df, so add 2 and 6 in each df)
5     4 3
6     5 0  (index 5 not exists in either df, so set 0)
7     6 0  (index 6 not exists in either df, so set 0)

The given data frames are just part of large dataset. Can I do it using dplyr or other packages in R?

akrun · Accepted Answer · 2014-10-24T06:31:59.493

Using data.table (would be efficient for bigger datasets). I am not changing the column names, as the rbindlist uses the name of the first dataset ie. in this case n from the second column (Don't know if it is a feature or bug). Once you join the datasets by rbindlist, group it by column index i.e. (by=index) and do the sum of n column (list(n=sum(n)) )

 library(data.table)
 rbindlist(list(data.frame(index=0:6,n=0), df1,df2))[,list(n=sum(n)), by=index]
     index n
 #1:     0 1
 #2:     1 4
 #3:     2 5
 #4:     3 8
 #5:     4 3
 #6:     5 0
 #7:     6 0

Or using dplyr. Here, the column names of all the datasets should be the same. So, I am changing it before binding the datasets using rbind_list. If the names are different, there will be multiple columns for each name. After joining the datasets, group it by index and then use summarize and do the sum of column n.

 library(dplyr)
 nm1 <- c("index", "n") 
 colnames(df1) <- colnames(df2) <- nm1 
 rbind_list(df1,df2, data.frame(index=0:6, n=0)) %>% 
                                          group_by(index) %>%
                                          summarise(n=sum(n))

MrFlick · Answer 2 · 2014-10-23T18:40:04.157

0

This is something you could do with the base functions aggregate and rbind

df1 = data.frame(index=c(0,3,4),n=c(1,2,3))  
df2 = data.frame(index=c(1,2,3),n=c(4,5,6))  
aggregate(n~index, rbind(df1, df2, data.frame(index=0:6, n=0)), sum)

which returns

edited Oct 23 '14 at 18:40

answered Oct 23 '14 at 03:19

MrFlick

195,160
17
277
295

score 0 · Answer 3 · answered Apr 27 '17 at 08:05

0

How about

names(df1) <- c("index", "n")                # set colnames of df1 to target
df3 <- rbind(df1,setNames(df2, names(df1)))  # set colnnames of df2 and join
df <- df3 %>% dplyr::arrange(index)          # sort by index

Cheers.

answered Apr 27 '17 at 08:05

Boern

7,233
5
55
86

How to combine two data frames using dplyr or other packages?

3 Answers3

Linked