1

There is a classic approach how to simultaneously merge multiple data.frames in a list.

The output, however, is somewhat disordered.

Example

> L
[[1]]
  a b c  d e
1 5 2 4 10 1

[[2]]
  a b c d e
1 6 7 4 6 1

[[3]]
  a b c d
1 7 3 5 5

[[4]]
  a b c d
1 5 2 6 5

[[5]]
  a b c d
1 4 4 2 8

The rows of the output of Reduce(.) are ordered by 5, 1, 4, 2, 3, which could imply that the reduction works somehow from the outside to the inside.

> Reduce(function(...) merge(..., all=TRUE), L)
> Reduce(function(x, y) merge(x, y, all=TRUE, by=intersect(names(x), names(y))), L)  # same
  a b c  d  e
1 4 4 2  8 NA
2 5 2 4 10  1
3 5 2 6  5 NA
4 6 7 4  6  1
5 7 3 5  5 NA

Anyway, is there a way to slightly change the code to get an ordered output like that below?

#   a b c  d  e
# 1 5 2 4 10  1
# 2 6 7 4  6  1
# 3 7 3 5  5 NA
# 4 5 2 6  5 NA
# 5 4 4 2  8 NA

Data

L <- list(structure(list(a = 5L, b = 2L, c = 4L, d = 10L, e = 1L), class = "data.frame", row.names = c(NA, 
-1L)), structure(list(a = 6L, b = 7L, c = 4L, d = 6L, e = 1L), class = "data.frame", row.names = c(NA, 
-1L)), structure(list(a = 7L, b = 3L, c = 5L, d = 5L), class = "data.frame", row.names = c(NA, 
-1L)), structure(list(a = 5L, b = 2L, c = 6L, d = 5L), class = "data.frame", row.names = c(NA, 
-1L)), structure(list(a = 4L, b = 4L, c = 2L, d = 8L), class = "data.frame", row.names = c(NA, 
-1L)))
jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • I'm not sure if this is what you intend, but this will drop duplicates: `Reduce(merge, list(data.frame(a = 1), data.frame(a = 1)))` – alistaire Feb 09 '19 at 16:50

2 Answers2

1

Here, I use bind_rows from the dplyr package instead of merge.

L <- list(structure(list(a = 5L, b = 2L, c = 4L, d = 10L, e = 1L), class = "data.frame", row.names = c(NA, 
          -1L)), structure(list(a = 6L, b = 7L, c = 4L, d = 6L, e = 1L), class = "data.frame", row.names = c(NA, 
          -1L)), structure(list(a = 7L, b = 3L, c = 5L, d = 5L), class = "data.frame", row.names = c(NA, 
          -1L)), structure(list(a = 5L, b = 2L, c = 6L, d = 5L), class = "data.frame", row.names = c(NA, 
          -1L)), structure(list(a = 4L, b = 4L, c = 2L, d = 8L), class = "data.frame", row.names = c(NA, 
          -1L)))

library(dplyr)

Reduce(bind_rows, L) 
#>   a b c  d  e
#> 1 5 2 4 10  1
#> 2 6 7 4  6  1
#> 3 7 3 5  5 NA
#> 4 5 2 6  5 NA
#> 5 4 4 2  8 NA

Created on 2019-02-09 by the reprex package (v0.2.1.9000)

Dan
  • 11,370
  • 4
  • 43
  • 68
1

This happens due to sort of merge:

sort - logical. Should the result be sorted on the by columns?

So, instead you may use

Reduce(function(...) merge(..., all = TRUE, sort = FALSE), L)
#   a b c  d  e
# 1 5 2 4 10  1
# 2 6 7 4  6  1
# 3 7 3 5  5 NA
# 4 5 2 6  5 NA
# 5 4 4 2  8 NA
Julius Vainora
  • 47,421
  • 9
  • 90
  • 102