Sum and dividing by total number of repeats using dplyr?

Question

Suppose I have a dataframe as such:

d<- data.frame (type=c("rna","rna","rna"), value = c(1,2,3) )
d2 <- data.frame (type=c("dna","dna"), value = c(20,30) )
df <- rbind (d,d2)

It looks like this:

  type value
1  rna     1
2  rna     2
3  rna     3
4  dna    20
5  dna    30

Now I need to summarize by type, that is aggregate all types by summing AND dividing it by the total occurence. For example in the above example it would be for rna (1+1+3) / 3 and dna (20+30)/2 however currently I can only to get it to sum as such,

library(dplyr)
    df %>%
        group_by(type) %>%
        summarise_all(sum) %>%
        data.frame()

the above code produces

  type value
1  rna     6
2  dna    50

whereas what I really want is this

  type value
1  rna     2
2  dna    25

thanks.

score 1 · Accepted Answer · answered May 30 '18 at 05:21

1

We need to divide by the number of rows in each group (n())

df %>% 
   group_by(type) %>%
   summarise(value = sum(value)/n())

which is otherwise the mean

df %>%
  group_by(type) %>% 
  summarise(value = mean(value))
# A tibble: 2 x 2
#   type  value
#   <fct> <dbl>
#1 rna       2
#2 dna      25

answered May 30 '18 at 05:21

akrun

874,273
37
540
662

1

yah, haha that right. thank you! Did not realize that but the the /n part is useful. – Ahdee May 30 '18 at 05:25

Sum and dividing by total number of repeats using dplyr?

1 Answers1