How to merge multiple data.frames with Reduce and get an ordered output?

Question

There is a classic approach how to simultaneously merge multiple data.frames in a list.

The output, however, is somewhat disordered.

Example

> L
[[1]]
  a b c  d e
1 5 2 4 10 1

[[2]]
  a b c d e
1 6 7 4 6 1

[[3]]
  a b c d
1 7 3 5 5

[[4]]
  a b c d
1 5 2 6 5

[[5]]
  a b c d
1 4 4 2 8

The rows of the output of Reduce(.) are ordered by 5, 1, 4, 2, 3, which could imply that the reduction works somehow from the outside to the inside.

> Reduce(function(...) merge(..., all=TRUE), L)
> Reduce(function(x, y) merge(x, y, all=TRUE, by=intersect(names(x), names(y))), L)  # same
  a b c  d  e
1 4 4 2  8 NA
2 5 2 4 10  1
3 5 2 6  5 NA
4 6 7 4  6  1
5 7 3 5  5 NA

Anyway, is there a way to slightly change the code to get an ordered output like that below?

#   a b c  d  e
# 1 5 2 4 10  1
# 2 6 7 4  6  1
# 3 7 3 5  5 NA
# 4 5 2 6  5 NA
# 5 4 4 2  8 NA

Data

L <- list(structure(list(a = 5L, b = 2L, c = 4L, d = 10L, e = 1L), class = "data.frame", row.names = c(NA, 
-1L)), structure(list(a = 6L, b = 7L, c = 4L, d = 6L, e = 1L), class = "data.frame", row.names = c(NA, 
-1L)), structure(list(a = 7L, b = 3L, c = 5L, d = 5L), class = "data.frame", row.names = c(NA, 
-1L)), structure(list(a = 5L, b = 2L, c = 6L, d = 5L), class = "data.frame", row.names = c(NA, 
-1L)), structure(list(a = 4L, b = 4L, c = 2L, d = 8L), class = "data.frame", row.names = c(NA, 
-1L)))

I'm not sure if this is what you intend, but this will drop duplicates: `Reduce(merge, list(data.frame(a = 1), data.frame(a = 1)))` — alistaire, Feb 09 '19 at 16:50

score 1 · Answer 1 · answered Feb 09 '19 at 15:57

Here, I use bind_rows from the dplyr package instead of merge.

L <- list(structure(list(a = 5L, b = 2L, c = 4L, d = 10L, e = 1L), class = "data.frame", row.names = c(NA, 
          -1L)), structure(list(a = 6L, b = 7L, c = 4L, d = 6L, e = 1L), class = "data.frame", row.names = c(NA, 
          -1L)), structure(list(a = 7L, b = 3L, c = 5L, d = 5L), class = "data.frame", row.names = c(NA, 
          -1L)), structure(list(a = 5L, b = 2L, c = 6L, d = 5L), class = "data.frame", row.names = c(NA, 
          -1L)), structure(list(a = 4L, b = 4L, c = 2L, d = 8L), class = "data.frame", row.names = c(NA, 
          -1L)))

library(dplyr)

Reduce(bind_rows, L) 
#>   a b c  d  e
#> 1 5 2 4 10  1
#> 2 6 7 4  6  1
#> 3 7 3 5  5 NA
#> 4 5 2 6  5 NA
#> 5 4 4 2  8 NA

^{Created on 2019-02-09 by the reprex package (v0.2.1.9000)}

You don't need the `Reduce` because `bind_rows` can take a list, so you can just write `bind_rows(L)` — alistaire, Feb 09 '19 at 16:46

score 1 · Accepted Answer · answered Feb 09 '19 at 15:58

This happens due to sort of merge:

sort - logical. Should the result be sorted on the by columns?

So, instead you may use

Reduce(function(...) merge(..., all = TRUE, sort = FALSE), L)
#   a b c  d  e
# 1 5 2 4 10  1
# 2 6 7 4  6  1
# 3 7 3 5  5 NA
# 4 5 2 6  5 NA
# 5 4 4 2  8 NA

How to merge multiple data.frames with Reduce and get an ordered output?

2 Answers2