The undocumented merge_recurse() function from Hadley's reshape package is widely recommended on this site for merging multiple dataframes into one. Can anyone help me figure out why it is failing for me?
df1 <- data.frame(id=1:10, y=runif(10))
df2 <- data.frame(id=1:10, y=runif(10))
df3 <- data.frame(id=1:10, y=runif(10))
merge(df1,
merge(df2, df3, all = TRUE, sort = FALSE, by="id"),
all = TRUE, sort = FALSE, by="id")
merge_recurse(list(df1,df2,df3), by="id")
From my reading of the merge_recurse code, the two statements above should produce the same output, but they don't. merge_recurse()
seems to do some weird combination of merging into columns and rows, whereas the explicit merge()
statement does what I intended.
> merge(df1,
+ merge(df2, df3, all = TRUE, sort = FALSE, by="id"),
+ all = TRUE, sort = FALSE, by="id")
id y y.x y.y
1 1 0.3442246 0.40752170 0.7543310
2 2 0.6855180 0.90333706 0.9078623
3 3 0.5824061 0.94068441 0.3569613
4 4 0.8609505 0.03080645 0.5408886
5 5 0.6165643 0.19211396 0.3239516
6 6 0.7091000 0.83652412 0.9922271
7 7 0.4040763 0.07829698 0.3626811
8 8 0.6638416 0.92631462 0.9887723
9 9 0.0425038 0.95156785 0.2350344
10 10 0.9128549 0.65482298 0.1854737
> merge_recurse(list(df1,df2,df3), by="id")
id y.x y.y
1 1 0.3442246 0.40752170
2 1 0.3442246 0.75433099
3 2 0.6855180 0.90333706
4 2 0.6855180 0.90786227
5 3 0.5824061 0.35696133
6 3 0.5824061 0.94068441
7 4 0.8609505 0.54088859
8 4 0.8609505 0.03080645
9 5 0.6165643 0.19211396
10 5 0.6165643 0.32395157
11 6 0.7091000 0.83652412
12 6 0.7091000 0.99222707
13 7 0.4040763 0.36268111
[ reached getOption("max.print") -- omitted 7 rows ]
> sessionInfo()
R version 3.0.3 (2014-03-06)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] reshape_0.8.4 plyr_1.8.1
loaded via a namespace (and not attached):
[1] Rcpp_0.11.1 compiler_3.0.3 tools_3.0.3