4

I have several data frames, a b c d, each with the same column names. I want to find the mean and median of those data frames. In other words, construct new mean and median data frames that are the same size as a, b, etc.

I could use a couple of for loops, but I bet there is a slick way of doing this using the R built-in functions that would be faster.

tkerwin
  • 9,559
  • 1
  • 31
  • 47

3 Answers3

9

Following Josh Ulrich's answer, how about

library(abind)
apply(abind(a,b,c,d,along=3),c(1,2),median)

? (Using rowMeans on the appropriate slice will still be faster than applying mean ... I think there is a rowMedians in the Biobase (Bioconductor) package if you really need speed?)

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
2

I'm not sure JD's answer gives you exactly what you want, since the resulting object wouldn't be the same dimensions as a, b, etc.

Putting your data.frames into a list is a good start though. Then you can subset each column into a new list, cbind that list into a matrix and use apply over it's rows.

a <- data.frame(rnorm(10), runif(10))
b <- data.frame(rnorm(10), runif(10))
c <- data.frame(rnorm(10), runif(10))
d <- data.frame(rnorm(10), runif(10))
myList <- list(a,b,c,d)
sapply(1:ncol(a), function(j) {  # median
  apply(do.call(cbind,lapply(myList,`[`,,j)), 1, median)
})
sapply(1:ncol(a), function(j) {  # mean
  apply(do.call(cbind,lapply(myList,`[`,,j)), 1, mean)
})
sapply(1:ncol(a), function(j) {  # faster mean
  rowMeans(do.call(cbind,lapply(myList,`[`,,j)))
})
Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
1

you could string your data frames into a list of data frames, then use lapply(myList, mean, ...)

JD Long
  • 59,675
  • 58
  • 202
  • 294
  • If you mean `lapply(c(a, b), mean)`, then that's not right. That gives me the mean of each column individually, rather than across data frames. – tkerwin Dec 21 '10 at 19:03
  • ohhhhh... I didn't realize you wanted them all combined. – JD Long Dec 21 '10 at 19:16