1

I would like to merge two lists of dataframes according to a common id variable, consider the following example

set.seed(1)
mylist1=data.frame(id=sample(paste0("id",sample(1:5,10,T))),var1=sample(letters[1:26],10,T),stringsAsFactors=F);mylist1=split(mylist1,mylist1$id)
set.seed(2)
mylist2=data.frame(id=sample(paste0("id",sample(1:5,10,T))),var2=sample(LETTERS[1:26],10,T),stringsAsFactors=F);mylist2=split(mylist2,mylist2$id)

mylist1
# $id1
# id     var1
# id1    d
# 
# $id2
# id     var1
# id2    f
# id2    g
# id2    w
# etc.

mylist2
# $id1
# id     var2
# id1    V
# id1    D
# id1    J
# 
# $id3
# id     var2
# id3    K
# id3    J
# id3    Z
# etc.

The resulting list of dataframes should look like

# $id1
# id  var1 var2
# id1 d    V
# id1 d    D
# id1 d    J

# $id2
# id  var1 var2
# id2 f    NA
# id2 g    NA
# id2 w    NA
# etc.

Do yo know how I could do this?

goclem
  • 904
  • 1
  • 10
  • 21
  • possible duplicate of [Simultaneously merge multiple data.frames in a list](http://stackoverflow.com/questions/8091303/simultaneously-merge-multiple-data-frames-in-a-list) – jeremycg Aug 13 '15 at 12:52
  • In this case you could marge data.frame's and then split the resulting one into a list – Andriy T. Aug 13 '15 at 12:53
  • Your input data and the one you showed is not correct – akrun Aug 13 '15 at 12:55
  • Try `Map(merge, mylist1, mylist2,MoreArgs=list(by='id', all=TRUE))` – akrun Aug 13 '15 at 12:55
  • In the example, the lengths of mylist1 and mylist2 are different ie. 5 vs. 4 – akrun Aug 13 '15 at 13:02
  • @jeremycg. This is not the same question as the output should be a list of dataframes and not a data frame. akrun, I corrected the code, sorry for the typo – goclem Aug 13 '15 at 13:03
  • @Clement In the examples, I find some ids are not present in mylist1 , which is found in mylist2. How do you want to deal those cases – akrun Aug 13 '15 at 13:05
  • @akrun, this is the case of `id2` which is not present in `mylist2`. In this case, the resulting dataframe (in the list) should take `NA` values for `id2`. i.e. example for an illustration – goclem Aug 13 '15 at 13:08

1 Answers1

2

We can use Map to do this. From the example dataset, it is clear that only some list elements are common to both (based on the names of the list elements).

Our first step would be to get all the unique names in each of the list using union. We subset the first ('lst1') and second list ('lst2') with those names ('nm1'). If there is a missing element, it will be a NULL element for that position.

nm1 <- union(names(mylist1), names(mylist2))
lst1 <- mylist1[nm1]
lst2 <- mylist2[nm1]

Now, we change the NULL values in each list by creating a 'data.frame' for that position. We can use if/else to do this on a lapply loop.

lst1 <- lapply(lst1, function(x) if(is.null(x)) 
                         data.frame(id=NA, var1=NA) else x)
lst2 <- lapply(lst2, function(x) if(is.null(x))
                        data.frame(id=NA, var2=NA) else x)

After that, we can merge the two lists using Map. The corresponding elements of the lists are merged. Instead of using anonymous function, we can make use of MoreArgs to specify the extra arguments that may be needed for the merge.

Map(merge, lst1, lst2,MoreArgs=list(by='id', all=TRUE))
#$id1
#   id var1 var2
#1 id1    d    V
#2 id1    d    D
#3 id1    d    J

#$id2
#    id var1 var2
#1  id2    f   NA
#2  id2    g   NA
#3  id2    w   NA
#4 <NA> <NA>   NA

#$id3
#   id var1 var2
#1 id3    y    K
#2 id3    y    J
#3 id3    y    Z

#$id4
#   id var1 var2
#1 id4    a    D
#2 id4    i    D

#$id5
#   id var1 var2
#1 id5    q    R
#2 id5    q    M
#3 id5    q    D
#4 id5    k    R
#5 id5    k    M
#6 id5    k    D
#7 id5    j    R
#8 id5    j    M
#9 id5    j    D
akrun
  • 874,273
  • 37
  • 540
  • 662