0

I hope this hasnt been asked too many times. Here is example data:

structure(list(Cluster = c(1L, 1L, 1L, 1L, 5L, 5L, 5L, 13L, 17L, 
26L, 26L, 26L, 26L, 26L), X1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 30L, 24L, 129L, 50L, 1L), X2 = c(0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 3L, 39L, 111L, 37L, 0L), X3 = c(0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 74L, 80L, 15L, 40L, 0L), X4 = c(0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 68L, 90L, 10L, 11L, 0L), X5 = c(0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 17L, 17L, 53L, 28L, 0L), X6 = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 112L, 31L, 85L, 85L, 0L), X7 = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 82L, 211L, 91L, 28L, 0L), X8 = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 8L, 10L, 111L, 41L, 0L), X9 = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 111L, 57L, 17L, 22L, 0L)), .Names = c("Cluster", 
"X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8", "X9"), class = "data.frame", row.names = c("Wn0Sey25", 
"AewYy021", "HewYy267", "Wn0Sey16", "Wn0Se169", "EU861874.1.1466", 
"Wn0Sey03", "HQ178918.1.1424", "FR774764.1.1456", "Wm2Ae125", 
"Wm2Ae171", "HQ729797.1.1480", "HewYy246", "AewYyy56"))

I want to sum up rows with the same value in the field "cluster". I dont care what happens to the row IDs.

Thank you very much!

nouse
  • 3,315
  • 2
  • 29
  • 56
  • 1
    Try `library(dplyr);df1 %>% group_by(Cluster) %>% summarise_each(funs(sum))` If you need to get the entire `sum` of the rows, may be `library(tidyr);library(dplyr); df1 %>% gather(var, val, X1:X9) %>% group_by(Cluster) %>% summarise(val=sum(val))` – akrun Mar 11 '15 at 15:40
  • or `do.call("rbind",lapply(split(testdat,testdat$Cluster),function(x){colSums(x[,-1])}))` for a `base R` solution – Cath Mar 11 '15 at 15:42
  • you could also write a function to use within `apply`; but akrun's solution is simpler. – alexwhitworth Mar 11 '15 at 15:42
  • 1
    What is the desired output? Would `rowSums(rowsum(df[-1], df$Cluster))` work for you? – David Arenburg Mar 11 '15 at 15:45
  • Akruns first solution using dplyr was exactly what i was looking for. – nouse Mar 11 '15 at 15:57
  • 1
    You don't need `dplyr` you can achieve the same just by `rowsum(df[-1], df$Cluster)` – David Arenburg Mar 11 '15 at 16:05
  • 1
    @DavidArenburg Is it vectorized as `rowSums`? @nouse This could be done in a number of ways, another `base R` compact method wil be `aggregate(.~Cluster, df1, sum)` – akrun Mar 11 '15 at 16:11
  • @akrun, yes. See `?rowsum` it is discussed there. – David Arenburg Mar 11 '15 at 16:15
  • [Here](http://stackoverflow.com/a/16657546/1315767) you can find some other alternatives for solving your problem – Jilber Urbina Mar 11 '15 at 16:20
  • @DavidArenburg Based on `1.4e6` row data, `rowsum` is relatively slower compared to `dplyr` – akrun Mar 11 '15 at 16:26
  • @akrun Please post an answer so you can get credit and we can know that this problem was solved. :D – JasonMArcher Mar 11 '15 at 17:43

1 Answers1

2

You can try

library(dplyr)
df1 %>% 
     group_by(Cluster) %>% 
     summarise_each(funs(sum))
akrun
  • 874,273
  • 37
  • 540
  • 662