I have a large dataset for which I want to determine the mean, sd and se depending on two variables (sample and protein), here is a subset of my data:
sample value protein
1 Stage 1 84796453 Tdrd6
2 Stage 1 85665703 Tdrd6
When I use
ddply(df, .(sample, protein), summarise, Mean = mean(value), SE = sd(value) / sqrt((length(value))), SD = sd(value))
I get
sample protein Mean SE SD
1 Stage 1 Tdrd6 85231078 434624.5 614651.9
The mean is correct, however, considering that I have only two values, the SD should be 434625 (the difference between the mean and either of the values, which is given in the output as SE), and (as calculated with excel) the SE should be 307326 (which is +-1/2 of the SD value given in the output). Does anyone know what is going on?
Thanks!