I have a dataframe with 2 columns. A quality score, and an outcomes. Outcomes are either 1
or 0
. Quality Scores are different integers from 1 - 135. This is a snapshot of the dataframe:
For each quality score, I would like to compute the mean. I can do it for one Quality Score at a time as such:
test <- subset(deletion_qs, qs == 10)
sum(test$outcomes)/length(test$outcomes)
[1] 0.4
But this is too slow. I was wondering if there is a way using one of the apply
functions?
Here is the data:
quality_score <- c(2, 1 ,18 , 1 , 2 , 1 , 1 , 1 , 2 , 1, 1 , 1 , 1 , 1 ,10 , 10 ,10, 10 , 10 , 10 , 10 , 10, 10 , 10 , 1 ,29 ,1 , 29 ,63 , 1 ,25 , 1 , 1 ,52 ,28 , 1 , 1 ,10 , 3, 28 , 1 , 20, 1, 10, 1 , 10 , 3 , 1 , 3 , 10 ,10 , 56 , 1, 1, 2 , 3 , 2 , 1 , 1, 44 , 1 , 1, 10 , 33 , 67 ,67, 19 , 8 , 39, 10 , 2 , 1 , 42 , 22, 7 , 93 , 1 , 12 , 10 ,135 , 1 , 31 , 6 , 16, 15 , 1 , 35 , 1, 10 , 10)
outcome <- c(0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0,
0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1)