1

I've a a list with a number of data set that corresponds to the object generated below:

data("AirPassengers"); data("mtcars")
lstDta <- list(dtaCars1 = mtcars, dtaCars2 = mtcars,
               dtaCars3 = mtcars, dtaOtherStuff = AirPassengers)

I would like to merge the cars data sets on row.names

Results

The results should correspond to the data frame:

res <- merge(
    x = merge(x = lstDta$dtaCars1, y = lstDta$dtaCars2, by = "row.names"),
    y = lstDta$dtaCars3, by.y = "row.names",
    by.x = "Row.names")

where the columns are joined using row.names (ideally, I would drop the Row.names variable but this doesn't bother me):

> dim(res)
[1] 32 34 

Problem

I want to achieve the same results making use of Reduce, in particular I am interested in:

  • Merging the data frames on the row.names
  • Filtering the list. For example, I want to merge the cars data only and ignore the other data set

Additional requirements

Very useful answer suggests defining a function outside reduce, on the lines of the code:

merge.all <- function(x, y) {
    merge(x, y, all=TRUE, by="Sample")
}

output <- Reduce(merge.all, DataList)

I would like to avoid defining the function outside the Reduce syntax.

Attempt

As shown in the attempt below, I would like to cram everything inside the Reduce:

dtaMrgd <- Reduce(f = function(x,y) {merge(x,y, by = "row.names")},
              lapply(lstDta[grepl("Cars", names(lstDta)) == TRUE]))

so the Reduce does two things:

  1. Filters the passed list using string according to matching names
  2. Uses the filtered object to apply merge function with the desired characteristics

Needles to say, the code above fails.


Notes

I'm specifically interested in a solution that would be of format res <- Reduce( ... ). I'm not interested in creating some additional objects/functions outside the Reduce().

Community
  • 1
  • 1
Konrad
  • 17,740
  • 16
  • 106
  • 167

1 Answers1

3

I think this is one way to achieve what you want:

res2 <- Reduce(function(x, y) {
  data.frame(merge(x, y, by = 0), row.names = row.names(x))[,-1]
}, lstDta[grep("Cars", names(lstDta))])

dim(res2)
#[1] 32 33

names(res2)
#[1] "mpg.x"  "cyl.x"  "disp.x" "hp.x"   "drat.x" "wt.x"   "qsec.x" "vs.x"   "am.x"   "gear.x" "carb.x"
#[12] "mpg.y"  "cyl.y"  "disp.y" "hp.y"   "drat.y" "wt.y"   "qsec.y" "vs.y"   "am.y"   "gear.y" "carb.y"
#[23] "mpg"    "cyl"    "disp"   "hp"     "drat"   "wt"     "qsec"   "vs"     "am"     "gear"   "carb"

  • To filter the input list, I'm using lstDta[grep("Cars", names(lstDta))]
  • I drop the first column of each merged result (Row.names) with [,-1]
  • You can use by = 0 as a synonym for row.names to avoid any problems trying to merge on Row.names and row.names
  • Without the explicit data.frame(..., row.names = row.names(x)), merge will drop the original mtcars row names and replace them with the default 1:nrow(x). This would be problematic for ensuing merge calls.

From the helpfile ?merge,

Columns to merge on can be specified by name, number or by a logical vector: the name "row.names" or the number 0 specifies the row names. If specified by name it must correspond uniquely to a named column in the input.

nrussell
  • 18,382
  • 4
  • 47
  • 60
  • It appears to be working like a charm, could you please explain why you did: `data.frame(merge(x, y, by = 0)`? – Konrad Jan 22 '16 at 20:09