2

Here is a 10 x 12 matrix:

mat <- matrix(runif(120, 0, 1), 10)

I am trying to find column sums for subsets of a matrix (specifically, column sums for columns 1 through 4, 5 through 8, and 9 through 12) by row. The desired output would be a 10 x 3 matrix.

I tried the approaches from this answer using tapply and by (with detours to rowsum and aggregate), but encountered errors with all of them.

Community
  • 1
  • 1
Joshua Rosenberg
  • 4,014
  • 9
  • 34
  • 73

3 Answers3

3

What the OP is describing is called a row sum in R:

# using Matthew Lundberg's example data
x <- matrix(1:36, 3,12)

g = split(seq(ncol(x)), (seq(ncol(x)) - 1) %/% 4 )
sapply(g, function(cols) rowSums( x[, cols] ))

#       0  1   2
# [1,] 22 70 118
# [2,] 26 74 122
# [3,] 30 78 126

It's typical to have grouping variables over rows/observations not columns/variables. To reach this case, the OP could transpose:

rowsum( t(x), (seq(ncol(x))-1) %/% 4 )
#   [,1] [,2] [,3]
# 0   22   26   30
# 1   70   74   78
# 2  118  122  126
Frank
  • 66,179
  • 8
  • 96
  • 180
2

You can do this with a brute-force approach, specifying each column within apply:

t(apply(x, 1, function(y) c(sum(y[1:4]), sum(y[5:8]), sum(y[9:12]))))

It's easier to see with non-random data, and a shorter matrix for input:

> x <- matrix(1:36, 3,12)
> x
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,]    1    4    7   10   13   16   19   22   25    28    31    34
[2,]    2    5    8   11   14   17   20   23   26    29    32    35
[3,]    3    6    9   12   15   18   21   24   27    30    33    36
> t(apply(x, 1, function(y) c(sum(y[1:4]), sum(y[5:8]), sum(y[9:12]))))
     [,1] [,2] [,3]
[1,]   22   70  118
[2,]   26   74  122
[3,]   30   78  126

You can also split the vector with split, and while this is more idiomatic for R and more flexible, it is not really more readable:

> t(apply(x, 1, function(y) sapply(split(y, ceiling(seq_along(y)/4)), sum)))
      1  2   3
[1,] 22 70 118
[2,] 26 74 122
[3,] 30 78 126
Matthew Lundberg
  • 42,009
  • 6
  • 90
  • 112
2

We could convert to array, use apply with MARGIN=1 and get the colSums

n <- 4
t(apply(array(mat, dim=c(nrow(mat), n, ncol(mat)/n)), 1, colSums))

Or another option is melt/acast from library(reshape2)

library(reshape2)
acast(melt(mat), Var1~(Var2-1)%/%n, value.var='value', sum)

The wrapper function recast can be used to make this compact

recast(mat, Var1~(Var2-1)%/%4, id.var=NULL, sum)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Nice. Depending on the application, the OP should consider keeping the data in an array to begin with. – Frank Sep 27 '15 at 13:52