Merging a data frame from a list of data frames

Question

I have a list of data frame that looks like this:

ls[[1]]
[[1]]

 month year   oracle
    1 2004 356.0000
    2 2004 390.0000
    3 2004 394.4286
    4 2004 391.8571 
 ls[[2]]
 [[2]]
 month year microsoft
    1 2004  339.0000
    2 2004  357.7143
    3 2004  347.1429
    4 2004  333.2857

How do I create a single data frame that looks like this:

 month year   oracle   microsoft
    1 2004 356.0000    339.0000
    2 2004 390.0000    357.7143
    3 2004 394.4286    347.1429
    4 2004 391.8571    333.2857

akrun · Accepted Answer · 2015-10-20T07:23:11.103

9

We could also use Reduce

Reduce(function(...) merge(..., by = c('month', 'year')), lst)

Using @Jaap's example, if the values are not the same, use all=TRUE option from merge.

Reduce(function(...) merge(..., by = c('month', 'year'), all=TRUE), ls)
#     month year   oracle microsoft   google
#1     1 2004 356.0000        NA       NA
#2     2 2004 390.0000  339.0000       NA
#3     3 2004 394.4286  357.7143 390.0000
#4     4 2004 391.8571  347.1429 391.8571
#5     5 2004       NA  333.2857 357.7143
#6     6 2004       NA        NA 333.2857

edited Oct 20 '15 at 07:23

answered Oct 20 '15 at 06:00

akrun

874,273
37
540
662

1

You can turn it into a recursive left-join with `all.x=TRUE`. Note that the order of the list now becomes important. – jiggunjer Sep 03 '19 at 11:42
does the `(...)` need to be filled with a variable name like `x` or should be be left as `(...)` ? – thentangler Sep 20 '22 at 16:43
1

@thentangler the 3 dots, is a variadic argument, it can take any number of arguments. Here, `merge` takes only two, so we can change that to `function(x, y) merg(x, y, by = c('month', 'year'), all = TRUE)` as well – akrun Sep 20 '22 at 16:49
1

Thank you for that explanation. Gives me a clearer understanding now! – thentangler Sep 20 '22 at 16:51

Jaap · Answer 2 · 2015-10-20T09:11:18.730

Using the Reduce/merge code from @akrun's answer will work great if the values for the month and year columns are the same for each dataframe. However, when they are not the same (example data at the end of this answer)

Reduce(function(...) merge(..., by = c('month', 'year')), ls)

will return only the rows which are common in each dataframe:

  month year   oracle microsoft   google
1     3 2004 394.4286  357.7143 390.0000
2     4 2004 391.8571  347.1429 391.8571

In that case, you can either use all=TRUE (as shown by @akrun) or use full_join from the dplyr package as an alternative when you want to include all rows/observations:

library(dplyr)
Reduce(function(...) full_join(..., by = c('month', 'year')), ls) 
# or just:
Reduce(full_join, ls)

this will result in:

  month year   oracle microsoft   google
1     1 2004 356.0000        NA       NA
2     2 2004 390.0000  339.0000       NA
3     3 2004 394.4286  357.7143 390.0000
4     4 2004 391.8571  347.1429 391.8571
5     5 2004       NA  333.2857 357.7143
6     6 2004       NA        NA 333.2857

Used data:

ls <- list(structure(list(month = 1:4, year = c(2004L, 2004L, 2004L, 2004L), oracle = c(356, 390, 394.4286, 391.8571)), .Names = c("month", "year", "oracle"), class = "data.frame", row.names = c(NA, -4L)), 
           structure(list(month = 2:5, year = c(2004L, 2004L, 2004L, 2004L), microsoft = c(339, 357.7143, 347.1429, 333.2857)), .Names = c("month", "year", "microsoft"), class = "data.frame", row.names = c(NA,-4L)),
           structure(list(month = 3:6, year = c(2004L, 2004L, 2004L, 2004L), google = c(390, 391.8571, 357.7143, 333.2857)), .Names = c("month", "year", "google"), class = "data.frame", row.names = c(NA,-4L)))

score 0 · Answer 3 · answered Oct 20 '15 at 06:43

0

You can also do do.call() as follows...

do.call(merge, ls)

answered Oct 20 '15 at 06:43

Gaurav

1,597
2
14
31

This would not work if there are more than 2 list elements – akrun Oct 20 '15 at 06:48
2

I meant suppose you have 3 list element, it gives error. – akrun Oct 20 '15 at 06:51
Yeah point taken... will work for above case though – Gaurav Oct 20 '15 at 06:53

Merging a data frame from a list of data frames

3 Answers3

Linked