12

I have a list of data.frame objects which i would like to row append to one another, ie merge(..., all=T). However, merge seems to remove the row names which I need to be kept intact. Any ideas? Example:

x = data.frame(a=1:2, b=2:3, c=3:4, d=4:5, row.names=c("row_1", "another_row1"))
y = data.frame(a=c(10,20), b=c(20,30), c=c(30,40), row.names=c("row_2", "another_row2"))
> merge(x, y, all=T, sort=F)
     a  b  c  d
  1  1  2  3  4
  2  2  3  4  5
  3 10 20 30 NA
  4 20 30 40 NA
Alex
  • 19,533
  • 37
  • 126
  • 195
  • may be z <- merge(x, y, all=T, sort=F); rownames(z) <- c(rownames(x), rownames(y)) – Arnaud A Feb 10 '13 at 15:55
  • 3
    If I understand you right, you want to `rbind` data frames of different numbers of columns together. [This question](http://stackoverflow.com/questions/3402371/rbind-different-number-of-columns) might be helpful to you, in particular, `rbind.fill` from the `plyr` package. – Blue Magister Feb 10 '13 at 15:56
  • @Arun [Ananda Mahto's answer](http://stackoverflow.com/a/14799551/697568) takes care of that. – Blue Magister Feb 10 '13 at 17:11

2 Answers2

16

Since you know you are not actually merging, but just rbind-ing, maybe something like this will work. It makes use of rbind.fill from "plyr". To use it, specify a list of the data.frames you want to rbind.

RBIND <- function(datalist) {
  require(plyr)
  temp <- rbind.fill(datalist)
  rownames(temp) <- unlist(lapply(datalist, row.names))
  temp
}
RBIND(list(x, y))
#               a  b  c  d
# row_1         1  2  3  4
# another_row1  2  3  4  5
# row_2        10 20 30 NA
# another_row2 20 30 40 NA
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
11

One way is to use row.names in merge so that you get it as an additional column.

> merge(x, y, by=c("row.names", "a","b","c"), all.x=T, all.y=T, sort=F)

#      Row.names  a  b  c  d
# 1        row_1  1  2  3  4
# 2 another_row1  2  3  4  5
# 3        row_2 10 20 30 NA
# 4 another_row2 20 30 40 NA

Edit: By looking at the merge function with getS3method('merge', 'data.frame'), the row.names are clearly set to NULL (it is a pretty long code, so I won't paste here).

# Commenting 
# Lines 63 and 64
row.names(x) <- NULL
row.names(y) <- NULL

# and 
# Line 141 (thanks Ananda for pointing out)
attr(res, "row.names") <- .set_row_names(nrow(res))

and creating a new function, say, MERGE, works as the OP intends for this example. Just an experimentation.

Arun
  • 116,683
  • 26
  • 284
  • 387
  • +1. I always forget about being able to merge on `"row.names"` – A5C1D2H2I1M1N2O1R2T1 Feb 10 '13 at 16:00
  • Regarding your edit, I also had to remove line 141 (`attr(res, "row.names") <- .set_row_names(nrow(res))`). I've put up a gist [here](https://gist.github.com/mrdwab/4750113) which can be loaded and run with `library(devtools); source_gist(4750113); MERGE(x, y, all = TRUE)`, at least in part validating your experimentation. – A5C1D2H2I1M1N2O1R2T1 Feb 10 '13 at 16:53
  • Let's imagine you have a third df, `z <- data.frame(a = c(11, 21), b = c(22, 32), d = c(33, 43), row.names = c("row_3", "another_row3"))`. How can we get regular `merge` to work (perhaps with `Reduce`, or even manually)? `MERGE` works as expected with `Reduce(function(x, y) MERGE(x, y, all = TRUE, sort = FALSE), list(x, y, z))` (more or less--the column order changes), and `RBIND(list(x, y, z))` also does the trick. But I can't figure out an unadulterated base `merge` solution here. Any ideas? – A5C1D2H2I1M1N2O1R2T1 Feb 10 '13 at 18:44