Creating a box plot with whiskers in Stata (or R)

Question

I want to create a boxplot with whiskers. I want to compare several studies. For each study I have

mean
standard deviation sd
name
number of observations n

How can I do this in Stata 13 ?

Normally I would type

graph box var

but var is not the mean ........

I have the same data loaded in to R. So if anyone knows how to do it in R - then its fine by my. I tried with > boxplot and > bxp — user3416877, Mar 13 '14 at 19:04
Post the data or structurally equal example data as [pasteable R code](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). — ziggystar, Mar 13 '14 at 19:11
Do you have access to all the data points? If not, simple barplot showing mean +/- SD may be preferable. — TWL, Mar 13 '14 at 19:14
As others imply, the question can be answered by faking a normal distribution with the same mean and SD. But answers would be (1) not exactly reproducible except with the same seed and the same program (2) widely regarded as indefensible statistically; minimally, this would need very careful explanation and justification. This is a statistical comment, but my view is that we should not encourage the use of statistical software for poorly chosen ends. Otherwise I agree with @TWL: show the mean +/- some multiple of the SD as a crude summary graph. — Nick Cox, Mar 13 '14 at 20:40
thanks for all your comments. I do agree with TWL, and know that it is not a accurate boxplot. but giving e.g. normal distribution I think i could be used as an illustration. I will do the barplot. thanks//trxtr — user3416877, Mar 18 '14 at 09:24

score 2 · Accepted Answer · answered Mar 13 '14 at 19:54

If all you have from each study is the mean, standard deviation, and number of observations, you cannot possibly generate an accurate boxplot. However, you could assume the outcomes follow a particular distribution (e.g. normal distribution) and plot a boxplot of synthetically generated datasets using those summary statistics:

set.seed(144)
dat <- data.frame(study=c("A", "B", "C"), mean=c(1, 1.5, 1.2), sd=c(1, 2, 3),
                  n=c(40, 100, 12))
synthetic <- do.call(rbind, lapply(split(dat, seq(nrow(dat))), function(row) {
  data.frame(study=row$study, y=rnorm(row$n, row$mean, row$sd))
}))
boxplot(y~study, data=synthetic)

enter image description here

Just to reiterate, this is synthetic data being plotted, assuming a particular form of distribution for the study outcome. If you need to plot the study results, you'll need more information about each study -- the min and max, 25, 50, and 75 quartiles, and any outliers.

if you're willing to assume the data are normal you can calculate the positions of the hinges/fences/etc. directly (you don't need simulation, although it does solve the problem just fine). — Ben Bolker, Mar 13 '14 at 20:33

score 0 · Answer 2 · answered Mar 13 '14 at 19:32

Here's a way to do it in R. If you have access to the individual data points, you can do something like the following:

# Fake data
y = rnorm(100)

boxplot(y)

If you only have the summary statistics, you can manually change the values for the box-and-whisker statistics as follows:

plot1 = boxplot(y)
plot1$stats
           [,1]
[1,] -2.1433772
[2,] -0.5599737
[3,]  0.1944167
[4,]  0.6697005
[5,]  2.2113372

The above numbers are in order: lower whisker, lower box, midline, upper box, upper whisker. You can change those numbers to whatever values you have, as follows:

plot1$stats = c(-1.5, -1.2, 0.3, 1.2, 2.6)

Or change single values as follows:

plot1$stats[2] = -1.2

Then redraw the plot:

boxplot(plot1$stats)

This is all very quick and dirty, but hopefully that will get you started.

Creating a box plot with whiskers in Stata (or R)

2 Answers2