0

The undocumented merge_recurse() function from Hadley's reshape package is widely recommended on this site for merging multiple dataframes into one. Can anyone help me figure out why it is failing for me?

df1 <- data.frame(id=1:10, y=runif(10))
df2 <- data.frame(id=1:10, y=runif(10))
df3 <- data.frame(id=1:10, y=runif(10))

merge(df1, 
      merge(df2, df3, all = TRUE, sort = FALSE, by="id"),
      all = TRUE, sort = FALSE, by="id")

merge_recurse(list(df1,df2,df3), by="id")

From my reading of the merge_recurse code, the two statements above should produce the same output, but they don't. merge_recurse() seems to do some weird combination of merging into columns and rows, whereas the explicit merge() statement does what I intended.

> merge(df1, 
+       merge(df2, df3, all = TRUE, sort = FALSE, by="id"),
+       all = TRUE, sort = FALSE, by="id")
   id         y        y.x       y.y
1   1 0.3442246 0.40752170 0.7543310
2   2 0.6855180 0.90333706 0.9078623
3   3 0.5824061 0.94068441 0.3569613
4   4 0.8609505 0.03080645 0.5408886
5   5 0.6165643 0.19211396 0.3239516
6   6 0.7091000 0.83652412 0.9922271
7   7 0.4040763 0.07829698 0.3626811
8   8 0.6638416 0.92631462 0.9887723
9   9 0.0425038 0.95156785 0.2350344
10 10 0.9128549 0.65482298 0.1854737

> merge_recurse(list(df1,df2,df3), by="id")
   id       y.x        y.y
1   1 0.3442246 0.40752170
2   1 0.3442246 0.75433099
3   2 0.6855180 0.90333706
4   2 0.6855180 0.90786227
5   3 0.5824061 0.35696133
6   3 0.5824061 0.94068441
7   4 0.8609505 0.54088859
8   4 0.8609505 0.03080645
9   5 0.6165643 0.19211396
10  5 0.6165643 0.32395157
11  6 0.7091000 0.83652412
12  6 0.7091000 0.99222707
13  7 0.4040763 0.36268111
 [ reached getOption("max.print") -- omitted 7 rows ]

> sessionInfo()
R version 3.0.3 (2014-03-06)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] reshape_0.8.4 plyr_1.8.1   

loaded via a namespace (and not attached):
[1] Rcpp_0.11.1    compiler_3.0.3 tools_3.0.3  
Arun
  • 116,683
  • 26
  • 284
  • 387
Chris Warth
  • 890
  • 12
  • 26
  • Searching for [tag:R] + `merge_recurse` [only gives me 21 results](http://stackoverflow.com/search?q=%5Br%5D+merge_recurse). Not exactly "widely recommended" :-) – A5C1D2H2I1M1N2O1R2T1 Apr 17 '14 at 19:28
  • ...and every one of those recommendations is flawed. merge_recurse() and merge_all() have simply been non-functional for the last 5 or so years _unless_ you did not specify any additional arguments to the merge function. As soon as you make us of those additional args, well, it just doesn't work, period. – Chris Warth Apr 18 '14 at 17:20
  • I should add that there are a number of cases where people used merge_recurse with additional arguments. Those cases either did not need the additional arguments, so didn't notice that they were non-functional, or didn't check the result to see it would not actually work. – Chris Warth Apr 18 '14 at 19:51

1 Answers1

1

I found out why this is happening - it is a bug in reshape::merge_recurse().

> merge_recurse
function (dfs, ...) 
{
    if (length(dfs) == 2) {
        merge(dfs[[1]], dfs[[2]], all = TRUE, sort = FALSE, ...)
    }
    else {
        merge(dfs[[1]], Recall(dfs[-1]), all = TRUE, sort = FALSE, 
            ...)
    }
}
<environment: namespace:reshape>

Note that '...' is missing from the call to Recall() This should read,

merge_recurse <- function (dfs, ...) 
    {
        if (length(dfs) == 2) {
            merge(dfs[[1]], dfs[[2]], all = TRUE, sort = FALSE, ...)
        }
        else {
            merge(dfs[[1]], Recall(dfs[-1], ...), all = TRUE, sort = FALSE, 
                  ...)
        }
    }

How could this have escaped notice for so long?

Chris Warth
  • 890
  • 12
  • 26
  • ah, but I already checked out the code there. That repository holds the new Reshape2 package, and merge_recurse() is not even there any more. I don't know where the code for the original reshape package lives. – Chris Warth Apr 16 '14 at 23:56
  • 1
    Oh yeah, you can also find the code for the "reshape" package as one of the branches at the reshape Github pages. [Here's `merge_recurse`](https://github.com/hadley/reshape/blob/reshape0.8/R/utils.r#L43)... – A5C1D2H2I1M1N2O1R2T1 Apr 17 '14 at 19:48
  • Thx Ananda, I email Hadley w/ the fix but since this function does not have any equivalent in the new reshape2 package it looks like it won't be applied. – Chris Warth Apr 18 '14 at 17:18
  • @ChrisWarth, Do you use "data.table"? If the data are keyed, there's a pretty convenient and efficient `Reduce(function(x, y) x[y], list(dt1, dt2, dt3))` approach that can be taken. Even nicer when you roll it into a custom function :-) – A5C1D2H2I1M1N2O1R2T1 Apr 18 '14 at 17:35