1

If I have a list of a list, and the list contains a set of dataframes and I want to merge the dataframes together but don't to merge all the list together. For example

 list<- list(list(df1_2010,df2_2010,df3_2010), list(df1_2011,df2_2011,df3_2011), list(df1_2012,df2_2012,df3_2012))

And i want to merge all the 2010 dataframe together by let say column id. And I want to merge the 2011 dataframes together by a similar column id, and I want to merge all the 2012 dataframes together by another similar column id.

I want to output a list of merged dataframes by year:

   list(df2010, df2011, df2012)

Here's a schematic of how I want to use the Reduce function:

   f<-function(b) merge(...,by="ID",all.x=T)
   list<- Reduce(f, list)

But I think this will merge all three lists together instead of each list separately. Let me know your suggestions.

  • Why not `lapply` a `merge`-based function to the `list` object? – Thomas Jul 28 '14 at 20:17
  • can that be done without any knowledge of the number of elements in the list. I was curious if this could be generalized, don't want to hardcode any numbers. Do you have an example of how lapply can be used. I know Reduce can do a lot of merges recursively, merging df1 to df2 to df3, to get a big dataframe. If you can provide an example of lapply or a link that will be extremely useful. – Barry Barrios Jul 28 '14 at 20:21

2 Answers2

1

Here's a simple reproducible example that I think maps onto your structure:

n <- 5
set.seed(n)
l <- list( list( data.frame(ID = 1:5, a = rnorm(n)),
                 data.frame(ID = 1:5, b = rnorm(n)),
                 data.frame(ID = 1:5, c = rnorm(n)),
                 data.frame(ID = 1:5, d = rnorm(n)) ),
           list( data.frame(ID = 1:5, a = rnorm(n)),
                 data.frame(ID = 1:5, b = rnorm(n)),
                 data.frame(ID = 1:5, c = rnorm(n)),
                 data.frame(ID = 1:5, d = rnorm(n)) ),
           list( data.frame(ID = 1:5, a = rnorm(n)),
                 data.frame(ID = 1:5, b = rnorm(n)),
                 data.frame(ID = 1:5, c = rnorm(n)),
                 data.frame(ID = 1:5, d = rnorm(n)) ))

You can write an lapply based function that uses Reduce on each element of the list:

out <-
lapply(l, function(x) Reduce(function(...) merge(..., by="ID", all.x=T), x))

And you should get a list of merged dataframes:

str(out)
List of 3
 $ :'data.frame':       5 obs. of  5 variables:
  ..$ ID: int [1:5] 1 2 3 4 5
  ..$ a : num [1:5] -0.8409 1.3844 -1.2555 0.0701 1.7114
  ..$ b : num [1:5] -0.603 -0.472 -0.635 -0.286 0.138
  ..$ c : num [1:5] 1.228 -0.802 -1.08 -0.158 -1.072
  ..$ d : num [1:5] -0.139 -0.597 -2.184 0.241 -0.259
 $ :'data.frame':       5 obs. of  5 variables:
  ..$ ID: int [1:5] 1 2 3 4 5
  ..$ a : num [1:5] 0.901 0.942 1.468 0.707 0.819
  ..$ b : num [1:5] -0.293 1.419 1.499 -0.657 -0.853
  ..$ c : num [1:5] 0.316 1.11 2.215 1.217 1.479
  ..$ d : num [1:5] 0.952 -1.01 -2 -1.762 -0.143
 $ :'data.frame':       5 obs. of  5 variables:
  ..$ ID: int [1:5] 1 2 3 4 5
  ..$ a : num [1:5] 1.5501 -0.8024 -0.0746 1.8957 -0.4566
  ..$ b : num [1:5] 0.5622 -0.887 -0.4602 -0.7243 -0.0692
  ..$ c : num [1:5] 1.463 0.188 1.022 -0.592 -0.112
  ..$ d : num [1:5] -0.925 0.7533 -0.1126 -0.0641 0.2333
Thomas
  • 43,637
  • 12
  • 109
  • 140
1

Another way to perform the recursive merge would be to use join_all from library(plyr)

library(plyr)
out1 <- lapply(l, join_all, by="ID") #using the example dataset of @Thomas
identical(out, out1)
# [1] TRUE
akrun
  • 874,273
  • 37
  • 540
  • 662