13

I have three independent data.frames. The three data.frames have the same number of columns and the same number of rows. Additionally They have the same column names. I' m trying to merge the three data.frames according to column names. I'm using the following code wrote to merge two data.frames and return the number of matches.

Merged_DF = sapply(names(DF1),function(n) nrow(merge(DF1, DF2, by=n)))

The problem is that while in this example there are two data.frames, in my case I have 3 data.frames. How can I modify the code to merge three data.frames instead of two? I tried to modify the string in this way simply adding the third data.frame but it does not work:

  Merged_DF = sapply(names(DF1),function(n) nrow(merge(DF1, DF2, DF3,  by=n)))

It returns the following error:

 Error in fix.by(by.x, x) :  'by' must specify column(s) as numbers, names or logical

Ex:

DF1

 G1  G2  G3
  a   b   f
  b   c   a
  c   d   b

DF2

 G1  G2  G3
  A   b   f
  b   c   a
  h   M   b

DF3

 G1  G2  G3
  a   b   f
  b   l   a
  j   M   v

The data.frames have around 250 rows and 50 cols.

Thomas
  • 43,637
  • 12
  • 109
  • 140
NewUsr_stat
  • 2,351
  • 5
  • 28
  • 38

3 Answers3

12

You can use the Reduce function to merge multiple data frames:

df_list <- list(DF1, DF2, DF3)
Reduce(function(x, y) merge(x, y, all=TRUE), df_list, accumulate=FALSE)

Or merge_recurse from the reshape package:

library(reshape)
data <- merge_recurse(df_list)

See also the R Wiki: Merge data frames

rcs
  • 67,191
  • 22
  • 172
  • 153
  • Hi, rcs, thanks a lot! Is it possible to have the count of matches between all the data.frames? Finally I would like to know how many items column by column, the three data.frames shere. – NewUsr_stat Mar 08 '13 at 11:16
  • merge_recourse is no longer in the new reshape2 package and this function has a bug https://stackoverflow.com/questions/23121654/r-why-is-merge-recurse-failing. So the first option is recommended. – jl-blancopastor Dec 17 '18 at 09:40
10

After researching this very same question for a couple hours today, I came up with this simple but elegant solution using a combination of 'dplyr' pipes and the base R 'merge()' function.

MergedDF <- merge(DF1, DF2) %>%
              merge(DF3)

As you mention in your post, this assumes that the column names are the same and that there's the same number of rows in each data frame you are merging. This will also automatically eliminate any duplicate columns (i.e., identifiers) that were used in the merging process.

Paul Sochacki
  • 433
  • 6
  • 8
1

Just in case anyone wants to merge multiple data frames with the same column name but unequal row numbers, this article was helpful: https://medium.com/coinmonks/merging-multiple-dataframes-in-r-72629c4632a3

Basically, you use the do.call and rbind functions:

Merged <- do.call("rbind", list(df1, df2, df3, df4))