5

I have a data.frame that looks like this (however with a larger number of columns and rows):

    Gene      Cell1    Cell2    Cell3     
1      A          2        7        8 
2      A          5        2        9 
3      B          2        7        8
4      C          1        4        3

I want to sum the rows that have the same value in Gene, in order to get something like this:

    Gene      Cell1    Cell2    Cell3     
1      A          7        9       17  
2      B          2        7        8
3      C          1        4        3

Based on the answers to previous questions, I've tried to use aggregate but I could not understand how I can get the above result. This is what I've tried:

aggregate(df[,-1], list(df[,1]), FUN = sum)

Does anyone have an idea of what I'm doing wrong?

Euclides
  • 99
  • 1
  • 7

2 Answers2

6
aggregate(df[,-1], list(Gene=df[,1]), FUN = sum)
#   Gene Cell1 Cell2 Cell3
# 1    A     7     9    17
# 2    B     2     7     8
# 3    C     1     4     3

will give you the output you are looking for.

lukeA
  • 53,097
  • 5
  • 97
  • 100
  • There's an error, when we run the above: `Error in aggregate.data.frame(df[, -1], list(Gene = df[, 1]), FUN = sum) : arguments must have same length` – Manoj Kumar May 28 '17 at 18:19
  • @ManojKumar Please add the output of `str(df)` to your post. – lukeA May 28 '17 at 18:23
  • Sure @lukeA here it is : `Classes ‘data.table’ and 'data.frame': 4 obs. of 4 variables: $ Gene : chr "A" "A" "B" "C" $ Cell1: int 2 5 2 1 $ Cell2: int 7 2 7 4 $ Cell3: int 8 9 8 3 - attr(*, ".internal.selfref")= ` – Manoj Kumar May 28 '17 at 18:26
  • 2
    @ManojKumar thx. You got a data table object; indexing is a bit different there. So you could e.g. do `aggregate(df[,-1], list(Gene=df[[1]]), FUN = sum)`. But if you got a data table anyway, you may want to use its strengths in aggregating data; `df[, lapply(.SD, sum), by=Gene]`. – lukeA May 28 '17 at 18:39
4

Or with dplyr:

library(dplyr)
df %>%
  group_by(Gene) %>%
  summarise_all(sum) %>%
  data.frame() -> newdf # so that newdf can further be used, if needed
Manoj Kumar
  • 5,273
  • 1
  • 26
  • 33
jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • 1
    the other methods work but this is more robust as well as intuitive. I like that one does not need to declare what columns to sum. – Ahdee May 26 '18 at 14:30