1

I know there are a lot of similar questions on how to sum up a column under a condition in R. But I somehow cannot implement the function aggregate or dplyr::group_by(df) %>% summarise(variable = sum(variable)) in my data. Also Combine rows and sum their values does not help me. But maybe you can? I want to merge and sum up rows of a data.frame in R.

df <- data.frame(file=c('sample1','sample1','sample2','sample3','sample2'),gene1=c(34,365,76,0,4),gene2=c(34,0,0,456,0))
> df
     file gene1 gene2
1 sample1    34    34
2 sample1   365     0
3 sample2    76     0
4 sample3     0   456
5 sample2    4      0

The output should look like this

 file gene1 gene2
1 sample1    399    34
2 sample2    80     0
3 sample3     0   456

takeITeasy
  • 350
  • 3
  • 19

3 Answers3

1

In base you can use rowsum to sum up rows by group.

rowsum(df[-1], df[,1])
#        gene1 gene2
#sample1   399    34
#sample2    80     0
#sample3     0   456

Or using aggregate:

aggregate(.~file, df, sum)
#     file gene1 gene2
#1 sample1   399    34
#2 sample2    80     0
#3 sample3     0   456

Or using by:

do.call(rbind, by(df[-1], df[,1], colSums))
#        gene1 gene2
#sample1   399    34
#sample2    80     0
#sample3     0   456
GKi
  • 37,245
  • 2
  • 26
  • 48
1

A dplyr approach would be:

library(dplyr)

df %>% group_by(file) %>% summarise_all(.funs = sum,na.rm=T)

Output:

# A tibble: 3 x 3
  file    gene1 gene2
  <fct>   <dbl> <dbl>
1 sample1   399    34
2 sample2    80     0
3 sample3     0   456
Duck
  • 39,058
  • 13
  • 42
  • 84
0

You can try this with dplyr

df %>% 
  group_by(file) %>% 
  summarise(gene1 = sum(gene1), gene2 = sum(gene2))

or data.table

setDT(df)[,.(gene1 = sum(gene1), gene2 = sum(gene2)), by= .(file)]
      file gene1 gene2
1: sample1   399    34
2: sample2    80     0
3: sample3     0   456
Tho Vu
  • 1,304
  • 2
  • 8
  • 20