0

This is based on an example on Simultaneously merge multiple data.frames in a list . I want to merge multiple data frames with a single column in common -- can do that with the example provided on the link. But it seems that using the parameter sort=TRUE has no effect. Here is the short, reproducible code:

x <- data.frame(i = c("m","b","c"), j = 1:3)
y <- data.frame(i = c("n","c","d"), k = 4:6)
z <- data.frame(i = c("o","d","a"), l = 7:9)

Merging all them without sorting:

Reduce(function(dtf1, dtf2) merge(dtf1, dtf2, by = "i", all = TRUE),
       list(x,y,z))

Gives me

  i  j  k  l
1 b  2 NA NA
2 c  3  5 NA
3 m  1 NA NA
4 d NA  6  8
5 n NA  4 NA
6 a NA NA  9
7 o NA NA  7

Merging all them with sorting:

Reduce(function(dtf1, dtf2) merge(dtf1, dtf2, by = "i", all = TRUE,sort = TRUE),
       list(x,y,z))

Gives me the same result as above. I know I can sort the dataframe after the merging (I guess it could even be probably more efficient) but why sort = TRUE has no effect?

I'm running R 3.4.3 under RStudio on a Mac. Thanks Rafael

Rafael Santos
  • 463
  • 5
  • 16

3 Answers3

2

The default merge() function call assumes sort=TRUE, so your first code was equivalently: merge(dtf1, dtf2, by = "i", all = TRUE, sort=TRUE).

The R documentation says when sort=TRUE, the results are sorted on the by columns, so in your case, it is sorting between m, b and c in the first dataframe (resulting in b, c, m in that order), and then it sorts n, c, d (resulting in c, d, n in that order), and then it sorts o, d, a (resulting in a, d, o in that order).

When you concatenate (c()) or combine them together through rbind(), you get a list that results in "b, c, m, c, d, n, a, o".

onlyphantom
  • 8,606
  • 4
  • 44
  • 58
0

The reason the 2 function calls display the same behaviour is because sort = T is the default action for the merge function.

The reason the sort order is not as you require is due to the embedding of the merge function within the Reduce function.

The sort is correct for each individual join but not for the overall result table that is created by the Reduce step.

Have you tried dplyr? Why use reduce and merge?

You could achieve the same with this:

if(!irequire(dplyr)){install.packages('dplyr')}; require(dplyr)
x %>%
left_join(y, by = 'i') %>%
left_join(z, by = 'i') %>%
arrange(...whatever order you wish said the Djinn...)
clancy
  • 182
  • 1
  • 3
  • 10
0

D'oh!

Column i was a factor and not a character.

Reading this: How to convert a factor to integer\numeric without loss of information? helped me convert the key column to a sortable value and solved the problem.

Thanks for the comments and answers.

Rafael Santos
  • 463
  • 5
  • 16