Another option opens up several other statistical options.
If you convert the list of 40x20 data.frames into a 40x20x1000 array, you can apply
across each of the 40x20 "tubes" drilling into the 3rd dimension.
Using a sample of three 2x4 matrices:
set.seed(42)
lst <- lapply(1:3, function(ign) matrix(sample(8), nrow=2))
lst
# [[1]]
# [,1] [,2] [,3] [,4]
# [1,] 8 2 3 4
# [2,] 7 5 6 1
# [[2]]
# [,1] [,2] [,3] [,4]
# [1,] 6 3 7 8
# [2,] 5 4 1 2
# [[3]]
# [,1] [,2] [,3] [,4]
# [1,] 8 3 4 2
# [2,] 1 6 7 5
Using the abind
library, we can arbitrarily bind along the third dim. (This is where you would begin, given that your data.frames are already captured in a list. abind
works equally well with identically-sized data.frames as it does with matrices.)
library(abind)
ary <- abind(lst, along = 3)
dim(ary)
# [1] 2 4 3
And now run arbitrary functions along each "tube" (versus "row" or "column", as most consider apply
to be used for). For example, given the [1,1]
values in the three layers are 8, 6, and 8, we would expect the following statistics:
mean(c(8,6,8))
# [1] 7.333333
sd(c(8,6,8))
# [1] 1.154701
Now, using apply
:
apply(ary, 1:2, mean)
# [,1] [,2] [,3] [,4]
# [1,] 7.333333 2.666667 4.666667 4.666667
# [2,] 4.333333 5.000000 4.666667 2.666667
apply(ary, 1:2, sd)
# [,1] [,2] [,3] [,4]
# [1,] 1.154701 0.5773503 2.081666 3.055050
# [2,] 3.055050 1.0000000 3.214550 2.081666
This opens up some more statistical aggregation of your 1000 identically-sized data.frames, assuming that the index within each layer is meaningfully comparable. You might be able to devise a working model to determine the median or other percentile with Reduce
, but it's quite easy to do (say) apply(ary, 1:2, quantile, 0.9)
for the 90th percentile.