I am trying to find the proportion of data that is greater than 20 for each of the factors I have in the data frame, then use those proportions to compute 2 other values:
dat <- data.frame(num1=as.numeric(c(10,30,4,60,20,1,34,87,66)), num2=as.numeric(c(23,36,42,18,3,44,32,65,78)), num3=as.numeric(c(0,0,0,20,80,10,50,43,70)), group=c("First group", "First group","First group", "Second group","Second group","Second group", "Third group","Third group","Third group"))
I would like to get 3 values (from a function) computed for each of the columns num1, num2 and num3, and each of the groups like this:
res = data.frame(cbind(col=c(rep("num1",3), rep("num2",3), rep("num3",3)), group=rep(c("First group", "Second group","Third group"),3) , p= c(0.3333333, 0.3333333, 1.0000000,1.0000000, 0.3333333,1.0000000,0.0000000,0.3333333,1.0000000), s1= c(-0.1250000, -0.1250000, -0.2500000,-0.2500000,-0.1250000,-0.2500000,0.0000000,-0.1250000,-0.2500000), s2= c(0.1000000, 0.1000000, 0.5000000,0.5000000, 0.1000000, 0.5000000, 0.0000000,0.1000000,0.5000000)))
I can get as far as returning data for each column like this:
prop <- function(s) {
n= length(s)
x=length(s[s>20])
p=x/n
s1=(p/2-p)/(p+1)
s2=(p/2-p)/(p-2)
return(c(p,s1,s2))
}
ddply(dat, .(group), summarise, prop(num1))
but then I don't understand how to bind them into a dataframe and apply to each columns. I have tried different ways (for example this but it is not working for me as I keep getting only one column. I am trying to do this by the way to then plot these values by group using ggplot2. Can you please help me?