I am trying to write a function to aggregate or subset a data frame by a particular column, and then count the proportion of values in another column within that dataframe with certain values.
Specifically, the relevant parts of my data frame, allmutations, look like this:
gennumber sel
1 -0.00351647088810292
1 0.000728499401888683
1 0.0354633950503043
1 0.000209700229276244
2 6.42307549736376e-05
2 -0.0497259605114181
2 -0.000371856995145525
Within each generation (gennumber), I would like to count the proportion of values in “sel” that are greater than 0.001, between -0.001 and 0.001, and less than -0.001. Over the entire data set, I've just been doing this:
ben <- allmutations$sel > 0.001 #this is for all generations
bencount <- length(which(ben==TRUE))
totalmu <- length(ben) # #length(ben) = total # of mutants
tot.pben <- bencount/totalmu #proportion
What is the best way to do that operation for each value in gennumber? Also, is there an easy way to get proportion of values in the range -0.001 < sel < 0.001? I couldn't figure out how to do it, so I “cheated” and took an absolute value of the column and just looked for values less than 0.001. I can't help but feel there must be a better way though.
Thanks for any help you can give, and please let me know if I can provide any clarification.
dput()
of data:
structure(list(gennumber = c(1L, 1L, 1L, 1L, 2L, 2L, 2L), sel = c(-0.00351647088810292,
0.000728499401888683, 0.0354633950503043, 0.000209700229276244,
6.42307549736376e-05, -0.0497259605114181, -0.000371856995145525
)), .Names = c("gennumber", "sel"), class = "data.frame", row.names = c(NA,
-7L))