-1

Starting from a table of 372 columns and 12,000 rows in R, I need to create a new table with columns that contain rows with the sum of same row from columns 1:4, then 5:8, then 9:12, and so on up to column 372 of the original table. Here a short example:

Input:

m = structure(c(3L, 1L, 2L, 6L, 3L, 1L, 1L, 8L, 1L, 5L, 2L, 1L, 3L, 7L,
+ 1L, 1L), .Dim = c(2L, 8L), .Dimnames = list(c("r1", "r2"), c("a", "b",
+"c", "d", "e", "f", "g", "h")))

Which looks like this:

    a  b  c  d  e  f  g  h
r1  3  2  3  1  1  2  3  1
r2  1  6  1  8  5  1  7  1

Expected output:

    A  B
r1  9  7
r2  16 14

So, A = a+b+c+d, and B=e+f+g+h. Easy to do with a small table in Excel. Columns a-d correspond to a group, e-f to another, if that helps.

fibar
  • 179
  • 1
  • 9
  • You will probably need to `dput` that object to make this reproducible as described here: http://stackoverflow.com/a/28481250/ Otherwise, you might have an object with any of a number of distinct classes (data.frame, matrix, table). For example, you might edit `m = structure(c(3L, 1L, 2L, 6L, 3L, 1L, 1L, 8L, 1L, 5L, 2L, 1L, 3L, 7L, 1L, 1L), .Dim = c(2L, 8L), .Dimnames = list(c("r1", "r2"), c("a", "b", "c", "d", "e", "f", "g", "h")))` into your question if you have a matrix. – Frank Sep 13 '16 at 16:32
  • Thanks for the hint Frank. I already edited my question. – fibar Sep 14 '16 at 08:38

4 Answers4

4

The question is currently underspecified, but supposing you have a matrix...

m = structure(c(3L, 1L, 2L, 6L, 3L, 1L, 1L, 8L, 1L, 5L, 2L, 1L, 3L, 
7L, 1L, 1L), .Dim = c(2L, 8L), .Dimnames = list(c("r1", "r2"), 
    c("a", "b", "c", "d", "e", "f", "g", "h")))

Make your column mapping:

map = data.frame(old = colnames(m), new = rep(LETTERS, each=4, length.out=ncol(m)))

  old new
1   a   A
2   b   A
3   c   A
4   d   A
5   e   B
6   f   B
7   g   B
8   h   B

And then rowsum by it:

res = rowsum(t(m), map$new)

  r1 r2
A  9 16
B  7 14

We have to transpose the data with t here because R has rowsum but no colsum. You can transpose it back afterwards, like t(res).

Frank
  • 66,179
  • 8
  • 96
  • 180
  • If you really are making names like A, B, ... and have 374 columns, of course you'll run out. In this case, you could use pairs of letters (for 26^2 combinations): in place of `LETTERS` in the answer, write `do.call(paste0, expand.grid(LETTERS, LETTERS))` – Frank Sep 13 '16 at 16:48
  • Thanks Frank. This worked, including this last suggestion for LETTERS. Although my object is a list. Should we change that in our question/answer? – fibar Sep 14 '16 at 09:17
  • @fibar Hm, I'm not sure what you mean by a list here, but I'd say go ahead and add it if you think it's relevant. – Frank Sep 14 '16 at 10:57
3

A base R solution, suppose df is your data frame:

cols = 8
do.call(cbind, lapply(seq(1, ncols, 4), function(i) rowSums(df[i:(i+3)])))
#    [,1] [,2]
# r1    9    7
# r2   16   14
Psidom
  • 209,562
  • 33
  • 339
  • 356
0

You can do this in a vectorised way if you transform your original data to a matrix with 4 columns, then use rowSums on that, and then transform it back to match the rows of the original data frame. Here it is in one long command

    df <- read.table(header = TRUE, text = "a  b  c  d  e  f  g  h
            3  2  3  1  1  2  3  1
                1  6  1  8  5  1  7  1")

    matrix(rowSums(matrix(as.vector(t(as.matrix(df))),
         ncol = 4, byrow = TRUE)), ncol = ncol(df) / 4, byrow = TRUE)

    #      [,1] [,2]
    #[1,]    9    7
    #[2,]   16   14

Edit: To preserve the row names, if e.g. rownames(df) <- c("r1", "r2"), just apply them to the resulting matrix (the row order is preserved), ie run rownames(result) <- rownames(df).

konvas
  • 14,126
  • 2
  • 40
  • 46
  • It would be nice to have the `r1`, `r2` labels preserved, I guess. Not sure if that would happen here, since you excluded them from your input. – Frank Sep 13 '16 at 16:51
  • 1
    @Frank I thought these were just row names. These can be added back at the end, I'll edit the post. – konvas Sep 13 '16 at 17:04
0

Another way:

df <- data.frame(t(matrix(colSums(matrix(t(df), nrow=4)),nrow=nrow(df))))
##  X1 X2
##1  9  7
##2 16 14
  1. First transpose the data to a 4 x (ncol(df)/4 * now(df)) matrix where now each column is a group of four columns for each row in the original data frame.
  2. Sum each column using colSums
  3. Transpose the data back to a data frame with the original number of rows
aichao
  • 7,375
  • 3
  • 16
  • 18