-1

I have a dataframe with 2 columns. A quality score, and an outcomes. Outcomes are either 1 or 0. Quality Scores are different integers from 1 - 135. This is a snapshot of the dataframe:

enter image description here

For each quality score, I would like to compute the mean. I can do it for one Quality Score at a time as such:

test <- subset(deletion_qs, qs == 10)
sum(test$outcomes)/length(test$outcomes)
[1] 0.4

But this is too slow. I was wondering if there is a way using one of the apply functions?

Here is the data:

quality_score <- c(2, 1 ,18 , 1 , 2 , 1 , 1 , 1 , 2 , 1, 1 , 1 , 1 , 1 ,10 , 10 ,10, 10 , 10 , 10 , 10 , 10, 10 , 10 , 1 ,29 ,1 , 29 ,63 , 1 ,25 , 1 , 1 ,52 ,28 , 1 , 1 ,10 , 3, 28 , 1 , 20, 1, 10, 1 , 10 , 3 , 1 , 3 , 10 ,10 , 56 , 1, 1, 2 , 3 , 2 , 1 , 1, 44 , 1 , 1, 10 , 33 , 67 ,67, 19 , 8 , 39, 10 , 2 , 1 , 42 , 22, 7 , 93 , 1 , 12 , 10 ,135 , 1 , 31 , 6 , 16, 15 , 1 , 35 , 1, 10 , 10)

outcome <- c(0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1)

Workhorse
  • 1,500
  • 1
  • 17
  • 27
  • 3
    I think you are looking for `aggregate(outcome ~ qs, data=df, mean)`. – lmo Mar 29 '17 at 15:33
  • I think there is a problem with your data, 90 (quality_score) and 91 (outcome) elements. They should be of equal length. – KoenV Mar 29 '17 at 15:34

1 Answers1

1

You can use dplyr group_by and summarise Combine first to "tot.data". Then

library(dplyr)

group_by(tot.data, quality_score) %>% summarise(Mean1 = mean(outcome))

MLEN
  • 2,162
  • 2
  • 20
  • 36