22

I have a list containing data frames as its elements in R.

Example:

df1 <- data.frame("names"=c("John","Sam","Dave"),"age"=c(21,22,25))
df2 <- data.frame("names"=c("John","Sam"),"score"=c(22,25))
df3 <- data.frame("names"=c("John","Sam","Dave"),"country"=c("US","SA","NZ"))
mylist <- list(df1,df2,df3)

Is it possible to merge all the elements of mylist together without using a loop?

My desired output for this example is:

  names age score country
1  John  21    22      US
2   Sam  22    25      SA

The list in this example has only three elements; however, I am looking for a solution that can handle an arbitrary number of elements.

coip
  • 1,312
  • 16
  • 30
user2109248
  • 325
  • 1
  • 2
  • 6

4 Answers4

32

You can use Reduce, one liner solution:

Reduce(merge,mylist)

  names age score country
1  John  21    22      US
2   Sam  22    25      SA
agstudy
  • 119,832
  • 17
  • 199
  • 261
8

Quick and dirty example:

merge(merge(df1, df2),df3)

EDIT - Very similar question here:Simultaneously merge multiple data.frames in a list

solution:

merged.data.frame = Reduce(function(...) merge(..., all=F), my.list)

Disclaimer - All I changed from @Charles answer was to make merge(..., all=F) rather than T - this way it gives your desired output.

Community
  • 1
  • 1
alexwhan
  • 15,636
  • 5
  • 52
  • 66
  • Thanks @alexwhan. I should have been more specific. I need a solution for a list with an arbitrary number of elements. My input list may have a different number of elements each time instead of the three in this example. – user2109248 Feb 26 '13 at 00:07
  • Yes, that's what I wondered – alexwhan Feb 26 '13 at 00:11
6

Just to show it could be done another way...

mymerge <- function(mylist) {
  names(mylist) <- sapply(mylist, function(x) names(x)[2])
  ns <- unique(unlist(lapply(mylist, function(x) levels(x$names))))
  as.data.frame(c(list(names=ns), lapply(mylist, function(x) 
                         {x[match(ns, x$names),2]})))
}

> mymerge(mylist)
  names age score country
1  Dave  25    NA      NZ
2  John  21    22      US
3   Sam  22    25      SA

One could easily adapt to remove rows with missing values, or perhaps just remove afterwards with complete.cases.

To show that it's faster, we'll make up a bigger data set; 100 variables and 25 names.

set.seed(5)
vs <- paste0("V", 1:100)
mylist <- lapply(vs, function(v) {
  x <- data.frame(names=LETTERS[1:25], round(runif(25, 0,100)))
  names(x)[2] <- v
  x
})

> microbenchmark(Reduce(merge, mylist), myf(mylist))
Unit: milliseconds
                   expr       min        lq    median        uq       max
1           myf(mylist)  12.81371  13.19746  13.36571  14.40093  33.90468
2 Reduce(merge, mylist) 199.23714 206.28608 207.30247 208.44939 226.05980
Aaron left Stack Overflow
  • 36,704
  • 7
  • 77
  • 142
0

Have you tried this function?

http://rss.acs.unt.edu/Rdoc/library/gtools/html/smartbind.html

library(gtools)
df1 <- data.frame(list(A=1:10), B=LETTERS[1:10], C=rnorm(10) )
df2 <- data.frame(A=11:20, D=rnorm(10), E=letters[1:10] )
df3 <- df1

out <- smartbind( mylist <- list(df1,df2,df3))
coip
  • 1,312
  • 16
  • 30
alap
  • 646
  • 1
  • 11
  • 24