Using as input the matrix m
derived from the built-in 11 by 8 data.frame anscombe
:
# create test matrix m
m <- as.matrix(anscombe)
1) apply/tapply Try this:
t(apply(m, 1, tapply, gl(4, 1, ncol(m)), sum))
giving:
1 2 3 4
[1,] 18.04 19.14 17.46 14.58
[2,] 14.95 16.14 14.77 13.76
[3,] 20.58 21.74 25.74 15.71
[4,] 17.81 17.77 16.11 16.84
[5,] 19.33 20.26 18.81 16.47
[6,] 23.96 22.10 22.84 15.04
[7,] 13.24 12.13 12.08 13.25
[8,] 8.26 7.10 9.39 31.50
[9,] 22.84 21.13 20.15 13.56
[10,] 11.82 14.26 13.42 15.91
[11,] 10.68 9.74 10.73 14.89
2) tapply or this giving the same result:
do.call(cbind, tapply(1:ncol(m), gl(4, 1, ncol(m)), function(ix) rowSums(m[, ix])))
3) tapply - 2 or this which gives a similar result:
matrix(tapply(m, gl(4 * nrow(m), 1, length(m)), sum), nrow(m))
4) apply/array or this which additionally requires that there be the same number of input columns summed into each of the output columns:
apply(array(m, c(nrow(m), 4, ncol(m) / 4)), 1:2, sum)
Note that this is just apply(array(m, c(11, 4, 2), 1:2, sum)
in the case of m
.
5) for This alternative is based on a for
loop:
res <- 0
for(i in seq(1, ncol(m), 4)) res <- res + m[, seq(i, length = 4)]
res
It would be possible to speed this up even more by setting res to m[, 1:4] and then starting i at 4+1 but the code gets a bit uglier so we will not bother.
6) Reduce
matrix(Reduce("+", split(m, gl(ncol(m) / 4, nrow(m) * 4))), nrow(m))
7) rowsum
t(rowsum(t(m), gl(4, 1, ncol(m))))
Note: Of the solutions tested below
- (6), (5) and (4) are the fastest in descending order of speed (i.e. (6) is fastest). These three also require that the number of columns of
m
be an even multiple of 4. (2) is the fastest of the solutions that do not require an even multiple followed by (3), (7) and (1) where (1) is the slowest.
- (7) is the shortest, (1) is the next shortest and (4) is the third shortest
Here is the benchmark:
library(rbenchmark)
benchmark(
one = t(apply(m, 1, tapply, gl(4, 1, ncol(m)), sum)),
two = do.call(cbind,
tapply(1:ncol(m), gl(4, 1, ncol(m)), function(ix) rowSums(m[, ix]))),
three = matrix(tapply(m, gl(4 * nrow(m), 1, length(m)), sum), nrow(m)),
four = apply(array(m, c(nrow(m), 4, ncol(m) / 4)), 1:2, sum),
five = {res <- 0
for(i in seq(1, ncol(m), 4)) res <- res + m[, seq(i, length = 4)]
res },
six = matrix(Reduce("+", split(m, gl(ncol(m) / 4, nrow(m) * 4))), nrow(m)),
seven = t(rowsum(t(m), gl(4, 1, ncol(m)))),
order = "relative", replications = 1000)[1:4]
giving:
test replications elapsed relative
6 six 1000 0.12 1.000
5 five 1000 0.18 1.500
4 four 1000 0.30 2.500
2 two 1000 0.31 2.583
3 three 1000 0.39 3.250
7 seven 1000 0.58 4.833
1 one 1000 2.27 18.917