1

I have the following toy data:

Gene cell1 cell2
Gene1 1 12
Gene1 9 1
Gene2 0 0
Gene3 6 11
df <- data.frame(
    Gene= c("Gene1","Gene1","Gene2","Gene3"),
    gene_1 = c(1,9,0,6),
    gene_2 = c(12,1,0,11)
)

I want to group by gene name and sum the value of other columns if they are duplicated.

Gene cell1 cell2
Gene1 10 13
Gene2 0 0
Gene3 6 11

I use the following code to complete this task, but I cannot use it for my actual data because it is quite large and the following code is very slow.

df <- df %>% 
    group_by(Gene) %>% 
    summarise(across(everything(), sum)) %>%
    ungroup()

Is there other, less computationally expensive, ways to complete this task? Thank you.

ZainNST
  • 187
  • 7
  • Take a look here: https://stackoverflow.com/questions/1660124/how-to-sum-a-variable-by-group. You have a specific answer for large data sets [here](https://stackoverflow.com/a/18686783/13460602). – Maël Jun 07 '22 at 11:57
  • and [here](https://stackoverflow.com/a/61478862/13460602) – Maël Jun 07 '22 at 12:03

1 Answers1

2

Try rowsums which is specialized in summing up per group.

rowsum(df[-1], df[,1])
#      gene_1 gene_2
#Gene1     10     13
#Gene2      0      0
#Gene3      6     11
GKi
  • 37,245
  • 2
  • 26
  • 48