Sounds like a trivial one, but some research didn´t come up with an elegant solution: I have a dataframe structured with a categorial variable (GROUP) and a continuous read-out variable (bloodpressure). How can a make a simple box-plot showing the mean for each group with its standard deviation? There are multiple groups: A,B,C,D How can I perform an ANOVA post-hoc analysis within the dataframe. How does it work with Mann-Whitney-U-Test? Can I mark the significance level in the bar-plot? How can I streamline this operation to multiple continuous variables (dia_bloodpressure, sys_bloodpressure, mean_bloodpressure) and sink() the output in different files (by name of the variable)?
-
..how many questions have you asked...?! – user1317221_G Sep 17 '12 at 16:29
-
this is a bit much for one question. perhaps you should have a look at http://stackoverflow.com/faq#questions and http://stackoverflow.com/q/5963269/1317221 and then streamline your question somewhat – user1317221_G Sep 17 '12 at 16:33
-
ok, I guess it´s a little bit much for one posting. But then: this is the typical workflow of analysis. So far I encountered packages dealing with one of the problems. It´s 1) multiple group testing 2) very rarely multiple group comparison 3)barplots of multiple groups , but never with significance levels. – Doc Sep 17 '12 at 17:38
-
can you give a reproducible example?? http://stackoverflow.com/questions/2286085/plotting-of-multiple-comparisons-in-r – Ben Bolker Jan 05 '13 at 16:48
3 Answers
After some research I came up with the agricolae package. This one provides multiple group comparison. The resulting objects can be pipelined into a decent plotting function for groupwise bar-graphs +/- SD or SEM. Unfortunately, no way to use markers of significance between groups in the plots.

- 358
- 1
- 4
- 24
After some more programming in R, I stumbled over another nice package suitable for medical research: psych.
Considering the question above, describe()
and describeBy()
get statistical overview of a dataframe and sort it by a grouping variable.
The function error.bars.by()
is an advanced plotting function for mean values +/- SD.
The package offers many functions on covariate analysis, which are useful in psychological research but might also help for medical and marketing research.

- 358
- 1
- 4
- 24
A possible code snippet:
library(psych)
x<-c(1,2,3,4,5,6,7,8,9,NA)
y<-c(2,3,NA,3,4,NA,2,3,NA,2)
group<-rep((factor(LETTERS[1:2])),5)
df<-data.frame(x,y,group)
df
by(df$x,df$group,summary)
by(df$x,df$group,mean)
sd(df$x) #result: NA
sd(df$x, na.rm=TRUE) #result: 2.738613
v = c("x", "y")#or
v = colnames(df)[1:2]
sapply(v, function(i) tapply(df[[i]], df$group, sd, na.rm=TRUE))
describeBy(df$x, df$group)
error.bars.by(df$x, df$group, bars=TRUE)

- 358
- 1
- 4
- 24