Merging data frames stored in lists

Question

I have two lists. Every component in the lists is a data frame. The two lists are symmetric. They both contain data frames for years 2006-2012, just on different themes. I would like to merge the data frames ' horizontally' (that is the one of 2006 in the first list with that of 2006 in the second list, and so on) obtaining a third list of data frames. I tried to figure out how to do that with lapply, but there must be something I didn't understand about that function.

Could you please help?

Thank you.

Hard to understand what you want without an example. Recommend starting here: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — Brandon Bertelsen, Nov 05 '13 at 17:18
Maybe you need `do.call(cbind, List_of_data_frames)`, but without a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) it's hard to figure out what you really need. — Jilber Urbina, Nov 05 '13 at 17:22
If you'd like to figure out what you don't understand it would be good to put up the code that didn't work for you. — John, Nov 05 '13 at 17:42
Sorry guys, I was so confused that I didn't manage to upload a piece of code that made sense to me. I had the feeling that this operation was possible by I couldn't really figure out how. I am a beginner user and need to completely figure out how to use lists and the functions that apply to them. Next time I'll try harder — Riccardo, Nov 06 '13 at 15:15

TheComeOnMan · Accepted Answer · 2016-01-18T03:10:49.550

3

Something like l3 in this code, you mean?

DT1 = data.frame(A=1:3,B=letters[1:3])
DT2 = data.frame(A=4:5,B=letters[4:5])
l1 = list(DT1,DT2)
DT1 = data.frame(A=1:3,C=letters[7:9])
DT2 = data.frame(A=4:5,C=letters[11:12])
l2 = list(DT1,DT2)

l3 <- vector(mode = "list", length = length(l1))
for ( i in 1:length(l1))
{
l3[[i]]   <- merge(l2[[i]],l1[[i]], by = "A")
}

edited Jan 18 '16 at 03:10

answered Nov 05 '13 at 17:23

TheComeOnMan

12,535
8
39
54

Yes, exactly. I also had in mind a for loop but I thought that it was possible (and parhaps more efficient) to do the job also with lapply. People always suggest that when dealing with lists. But thanks! I will follow your advice. – Riccardo Nov 05 '13 at 17:29
And @Riccardo, if this does answer your question then please considering clicking on the check mark next to the answer to consider it accepted. – TheComeOnMan Nov 05 '13 at 17:43
1

In general `apply` and similar functions are much more efficient than `for` loops. In this case, it probably doesn't matter. See pg 46 of [this issue of R News](http://www.r-project.org/doc/Rnews/Rnews_2008-1.pdf) for more details and tips on how to make `for` loops as efficient as possible. – Christopher Louden Nov 05 '13 at 17:48
@Christopher thank you very much for specifying this. – Riccardo Nov 06 '13 at 15:07
This doesn't run... `Error in vector(mode = "list", length = length(l)) : object 'l' not found` – hedgedandlevered Jan 14 '16 at 17:42
Thanks, @hedgedandlevered. Fixed now. – TheComeOnMan Jan 18 '16 at 03:11

Chase · Answer 2 · 2013-11-06T20:17:08.993

Perhaps something like this is what you're after?

df1 <- data.frame(year = 2006, x = 1:3)
df2 <- data.frame(year = 2007, x = 4:6)
df3 <- data.frame(year = 2006, x = 7:9)
df4 <- data.frame(year = 2007, x = 10:12)

l1 <- list(x2006 = df1, x2007 = df2)
l2 <- list(x2006 = df3, x2007 = df4)

lapply(names(l1), function(x) cbind(l1[[x]], l2[[x]]))
####
[[1]]
  year x year x
1 2006 1 2006 7
2 2006 2 2006 8
3 2006 3 2006 9

[[2]]
  year x year  x
1 2007 4 2007 10
2 2007 5 2007 11
3 2007 6 2007 12

There may be other functions that would be more appropriate than cbind() such as merge(), but this should get you on the right path. This obviously assumes that you have named your lists and those names are consistent between l1 and l2.

EDITED TO ADD SOME MORE CONTEXT

There are a few key assumptions that make this work. Those assumptions are:

Your list objects have names
The names in each list are consistent between lists

So, what are the names I'm referring to? If you look at the code about where I define l1, you'll see x2006 = df1 and x2007 = df2. I'm defining two objects in that list, df1 and df2 with two names x2006 and x2007.

You can check the names of the list by asking for the names():

names(l1)
####
[1] "x2006" "x2007"

The other key assumption is that you can index objects in a list by their name, using the [[ function. For example:

l1[["x2006"]]
####
  year x
1 2006 1
2 2006 2
3 2006 3

So what we're doing with the lapply function is that we're iterating over the names of l1, defining an anonymous function, and then using the [[ function to index the two list objects l1 and l2. We're currently using cbind as the function, but you can replace cbind with almost any other function.

As I mentioned above, this assumes that the names are the same between the two or more list objects. For example, this does not work:

#change the names of the l2 list
names(l2) <- c("foo", "bar")
lapply(names(l1), function(x) cbind(l1[[x]], l2[[x]]))
####
Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 3, 0

The names however do not have to be in the same order. That's where the benefit of the [[ function comes in. To wit:

#Fix names on l2 again
names(l2) <- c("x2006", "x2007")
l2reverse <- list(x2007 = df4, x2006 = df3)

all.equal(
  lapply(names(l1), function(x) cbind(l1[[x]], l2[[x]])),  
  lapply(names(l1), function(x) cbind(l1[[x]], l2reverse[[x]]))
)
####
[1] TRUE

that's exactly what I was looking for. Could you explain a bit more how you used lapply? In a very good programming manual I found a similar example to yours but didn't manage to understand it. — Riccardo, Nov 06 '13 at 15:17
@Riccardo - added some more context and explanation for you. — Chase, Nov 06 '13 at 20:17
Really kind of you to take the time to leave a clear explanation for me and other users. It is thanks to people like you that forums become powerful learning tools. I wish I had reputation above 15 to leave you a +1. Thank you. — Riccardo, Nov 13 '13 at 08:52

A5C1D2H2I1M1N2O1R2T1 · Answer 3 · 2013-11-05T18:23:26.263

mapply might be of use here too.

Here's a third interpretation of what you might be asking for:

Some sample data:

DT1 <- data.frame(A=1:3, B=letters[1:3])
DT2 <- data.frame(A=4:5, C=letters[4:5])
l1 <- list(DT1,DT2)
DT1 <- data.frame(A=1:3, B=letters[7:9])
DT2 <- data.frame(A=4:5, C=letters[11:12])
l2 = list(DT1,DT2)

merge with mapply:

mapply(FUN=function(x, y) merge(x, y, by="A"), 
       l1, l2, SIMPLIFY=FALSE)
# [[1]]
#   A B.x B.y
# 1 1   a   g
# 2 2   b   h
# 3 3   c   i
# 
# [[2]]
#   A C.x C.y
# 1 4   d   k
# 2 5   e   l

For reference....

Here's @Chase's interpretation of your question done with mapply:

mapply(cbind, l1, l2, SIMPLIFY=FALSE)
# $x2006
#   year x year x
# 1 2006 1 2006 7
# 2 2006 2 2006 8
# 3 2006 3 2006 9
# 
# $x2007
#   year x year  x
# 1 2007 4 2007 10
# 2 2007 5 2007 11
# 3 2007 6 2007 12

Here's @Codoremifa's interpretation of your question done with mapply:

mapply(FUN=function(x, y) merge(x, y), 
       l1, l2, SIMPLIFY=FALSE)
# [[1]]
#   A B C
# 1 1 a g
# 2 2 b h
# 3 3 c i
# 
# [[2]]
#   A B C
# 1 4 d k
# 2 5 e l

What would be more helpful is if you post some sample data and your expected output so that there is less guessing about what you're trying to do :-)

Wow @Ananda Mahto, great example! The first example with mapply is the one I was looking for. I will use it as a precious reference next times I will use lists. Many thanks. — Riccardo, Nov 06 '13 at 15:33

Merging data frames stored in lists

3 Answers3