7

I have a list of data frame that looks like this:

ls[[1]]
[[1]]

 month year   oracle
    1 2004 356.0000
    2 2004 390.0000
    3 2004 394.4286
    4 2004 391.8571 
 ls[[2]]
 [[2]]
 month year microsoft
    1 2004  339.0000
    2 2004  357.7143
    3 2004  347.1429
    4 2004  333.2857

How do I create a single data frame that looks like this:

 month year   oracle   microsoft
    1 2004 356.0000    339.0000
    2 2004 390.0000    357.7143
    3 2004 394.4286    347.1429
    4 2004 391.8571    333.2857
josliber
  • 43,891
  • 12
  • 98
  • 133
anonymous
  • 405
  • 8
  • 22

3 Answers3

9

We could also use Reduce

Reduce(function(...) merge(..., by = c('month', 'year')), lst)

Using @Jaap's example, if the values are not the same, use all=TRUE option from merge.

Reduce(function(...) merge(..., by = c('month', 'year'), all=TRUE), ls)
#     month year   oracle microsoft   google
#1     1 2004 356.0000        NA       NA
#2     2 2004 390.0000  339.0000       NA
#3     3 2004 394.4286  357.7143 390.0000
#4     4 2004 391.8571  347.1429 391.8571
#5     5 2004       NA  333.2857 357.7143
#6     6 2004       NA        NA 333.2857
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    You can turn it into a recursive left-join with `all.x=TRUE`. Note that the order of the list now becomes important. – jiggunjer Sep 03 '19 at 11:42
  • does the `(...)` need to be filled with a variable name like `x` or should be be left as `(...)` ? – thentangler Sep 20 '22 at 16:43
  • 1
    @thentangler the 3 dots, is a variadic argument, it can take any number of arguments. Here, `merge` takes only two, so we can change that to `function(x, y) merg(x, y, by = c('month', 'year'), all = TRUE)` as well – akrun Sep 20 '22 at 16:49
  • 1
    Thank you for that explanation. Gives me a clearer understanding now! – thentangler Sep 20 '22 at 16:51
5

Using the Reduce/merge code from @akrun's answer will work great if the values for the month and year columns are the same for each dataframe. However, when they are not the same (example data at the end of this answer)

Reduce(function(...) merge(..., by = c('month', 'year')), ls)

will return only the rows which are common in each dataframe:

  month year   oracle microsoft   google
1     3 2004 394.4286  357.7143 390.0000
2     4 2004 391.8571  347.1429 391.8571

In that case, you can either use all=TRUE (as shown by @akrun) or use full_join from the dplyr package as an alternative when you want to include all rows/observations:

library(dplyr)
Reduce(function(...) full_join(..., by = c('month', 'year')), ls) 
# or just:
Reduce(full_join, ls)

this will result in:

  month year   oracle microsoft   google
1     1 2004 356.0000        NA       NA
2     2 2004 390.0000  339.0000       NA
3     3 2004 394.4286  357.7143 390.0000
4     4 2004 391.8571  347.1429 391.8571
5     5 2004       NA  333.2857 357.7143
6     6 2004       NA        NA 333.2857

Used data:

ls <- list(structure(list(month = 1:4, year = c(2004L, 2004L, 2004L, 2004L), oracle = c(356, 390, 394.4286, 391.8571)), .Names = c("month", "year", "oracle"), class = "data.frame", row.names = c(NA, -4L)), 
           structure(list(month = 2:5, year = c(2004L, 2004L, 2004L, 2004L), microsoft = c(339, 357.7143, 347.1429, 333.2857)), .Names = c("month", "year", "microsoft"), class = "data.frame", row.names = c(NA,-4L)),
           structure(list(month = 3:6, year = c(2004L, 2004L, 2004L, 2004L), google = c(390, 391.8571, 357.7143, 333.2857)), .Names = c("month", "year", "google"), class = "data.frame", row.names = c(NA,-4L)))
Jaap
  • 81,064
  • 34
  • 182
  • 193
0

You can also do do.call() as follows...

do.call(merge, ls)

Gaurav
  • 1,597
  • 2
  • 14
  • 31