Summing up rows in a dataframe by a common factor

Question

I hope this hasnt been asked too many times. Here is example data:

structure(list(Cluster = c(1L, 1L, 1L, 1L, 5L, 5L, 5L, 13L, 17L, 
26L, 26L, 26L, 26L, 26L), X1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 30L, 24L, 129L, 50L, 1L), X2 = c(0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 3L, 39L, 111L, 37L, 0L), X3 = c(0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 74L, 80L, 15L, 40L, 0L), X4 = c(0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 68L, 90L, 10L, 11L, 0L), X5 = c(0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 17L, 17L, 53L, 28L, 0L), X6 = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 112L, 31L, 85L, 85L, 0L), X7 = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 82L, 211L, 91L, 28L, 0L), X8 = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 8L, 10L, 111L, 41L, 0L), X9 = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 111L, 57L, 17L, 22L, 0L)), .Names = c("Cluster", 
"X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8", "X9"), class = "data.frame", row.names = c("Wn0Sey25", 
"AewYy021", "HewYy267", "Wn0Sey16", "Wn0Se169", "EU861874.1.1466", 
"Wn0Sey03", "HQ178918.1.1424", "FR774764.1.1456", "Wm2Ae125", 
"Wm2Ae171", "HQ729797.1.1480", "HewYy246", "AewYyy56"))

I want to sum up rows with the same value in the field "cluster". I dont care what happens to the row IDs.

Thank you very much!

Try `library(dplyr);df1 %>% group_by(Cluster) %>% summarise_each(funs(sum))` If you need to get the entire `sum` of the rows, may be `library(tidyr);library(dplyr); df1 %>% gather(var, val, X1:X9) %>% group_by(Cluster) %>% summarise(val=sum(val))` — akrun, Mar 11 '15 at 15:40
or `do.call("rbind",lapply(split(testdat,testdat$Cluster),function(x){colSums(x[,-1])}))` for a `base R` solution — Cath, Mar 11 '15 at 15:42
you could also write a function to use within `apply`; but akrun's solution is simpler. — alexwhitworth, Mar 11 '15 at 15:42
What is the desired output? Would `rowSums(rowsum(df[-1], df$Cluster))` work for you? — David Arenburg, Mar 11 '15 at 15:45
Akruns first solution using dplyr was exactly what i was looking for. — nouse, Mar 11 '15 at 15:57
You don't need `dplyr` you can achieve the same just by `rowsum(df[-1], df$Cluster)` — David Arenburg, Mar 11 '15 at 16:05
@DavidArenburg Is it vectorized as `rowSums`? @nouse This could be done in a number of ways, another `base R` compact method wil be `aggregate(.~Cluster, df1, sum)` — akrun, Mar 11 '15 at 16:11
[Here](http://stackoverflow.com/a/16657546/1315767) you can find some other alternatives for solving your problem — Jilber Urbina, Mar 11 '15 at 16:20
@DavidArenburg Based on `1.4e6` row data, `rowsum` is relatively slower compared to `dplyr` — akrun, Mar 11 '15 at 16:26
@akrun Please post an answer so you can get credit and we can know that this problem was solved. :D — JasonMArcher, Mar 11 '15 at 17:43

score 2 · Accepted Answer · answered Mar 11 '15 at 17:44

2

You can try

library(dplyr)
df1 %>% 
     group_by(Cluster) %>% 
     summarise_each(funs(sum))

answered Mar 11 '15 at 17:44

akrun

874,273
37
540
662

Summing up rows in a dataframe by a common factor

1 Answers1