58

How can I generate the following plot in R? Points, shown in the plot are the averages, and their ranges correspond to minimal and maximal values. I have data in two files (below is an example).

x   y
1   0.8773
1   0.8722
1   0.8816
1   0.8834
1   0.8759
1   0.8890
1   0.8727
2   0.9047
2   0.9062
2   0.8998
2   0.9044
2   0.8960
..  ...

enter image description here

Roland
  • 127,288
  • 10
  • 191
  • 288
sherlock85
  • 871
  • 2
  • 12
  • 16
  • Since you clearly don't want a boxplot, I changed the title of your question in order to reflect what you really want. – Roland Oct 23 '12 at 15:11
  • 1
    also `plotrix::plotCI`, `gplots::plotCI`, `library("sos"); findFn("{error bar}")` – Ben Bolker Oct 23 '12 at 17:29

6 Answers6

144

First of all: it is very unfortunate and surprising that R cannot draw error bars "out of the box".

Here is my favourite workaround, the advantage is that you do not need any extra packages. The trick is to draw arrows (!) but with little horizontal bars instead of arrowheads (!!!). This not-so-straightforward idea comes from the R Wiki Tips and is reproduced here as a worked-out example.

Let's assume you have a vector of "average values" avg and another vector of "standard deviations" sdev, they are of the same length n. Let's make the abscissa just the number of these "measurements", so x <- 1:n. Using these, here come the plotting commands:

plot(x, avg,
    ylim=range(c(avg-sdev, avg+sdev)),
    pch=19, xlab="Measurements", ylab="Mean +/- SD",
    main="Scatter plot with std.dev error bars"
)
# hack: we draw arrows but with very special "arrowheads"
arrows(x, avg-sdev, x, avg+sdev, length=0.05, angle=90, code=3)

The result looks like this:

example scatter plot with std.dev error bars

In the arrows(...) function length=0.05 is the size of the "arrowhead" in inches, angle=90 specifies that the "arrowhead" is perpendicular to the shaft of the arrow, and the particularly intuitive code=3 parameter specifies that we want to draw an arrowhead on both ends of the arrow.

For horizontal error bars the following changes are necessary, assuming that the sdev vector now contains the errors in the x values and the y values are the ordinates:

plot(x, y,
    xlim=range(c(x-sdev, x+sdev)),
    pch=19,...)
# horizontal error bars
arrows(x-sdev, y, x+sdev, y, length=0.05, angle=90, code=3)
András Aszódi
  • 8,948
  • 5
  • 48
  • 51
12

Using ggplot and a little dplyr for data manipulation:

set.seed(42)
df <- data.frame(x = rep(1:10,each=5), y = rnorm(50))

library(ggplot2)
library(dplyr)

df.summary <- df %>% group_by(x) %>%
    summarize(ymin = min(y),
              ymax = max(y),
              ymean = mean(y))

ggplot(df.summary, aes(x = x, y = ymean)) +
    geom_point(size = 2) +
    geom_errorbar(aes(ymin = ymin, ymax = ymax))

If there's an additional grouping column (OP's example plot has two errorbars per x value, saying the data is sourced from two files), then you should get all the data in one data frame at the start, add the grouping variable to the dplyr::group_by call (e.g., group_by(x, file) if file is the name of the column) and add it as a "group" aesthetic in the ggplot, e.g., aes(x = x, y = ymean, group = file).

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
6
#some example data
set.seed(42)
df <- data.frame(x = rep(1:10,each=5), y = rnorm(50))

#calculate mean, min and max for each x-value
library(plyr)
df2 <- ddply(df,.(x),function(df) c(mean=mean(df$y),min=min(df$y),max=max(df$y)))

#plot error bars
library(Hmisc)
with(df2,errbar(x,mean,max,min))
grid(nx=NA,ny=NULL)
Roland
  • 127,288
  • 10
  • 191
  • 288
3

To summarize Laryx Decidua's answer:

define and use a function like the following

plot.with.errorbars <- function(x, y, err, ylim=NULL, ...) {
  if (is.null(ylim))
    ylim <- c(min(y-err), max(y+err))
  plot(x, y, ylim=ylim, pch=19, ...)
  arrows(x, y-err, x, y+err, length=0.05, angle=90, code=3)
}

where one can override the automatic ylim, and also pass extra parameters such as main, xlab, ylab.

1

Another (easier - at least for me) way to do this is below.

install.packages("ggplot2movies")

data(movies, package="ggplot2movies")
Plot average Length vs Rating
rating_by_len = tapply(movies$length,
                       movies$rating,
                       mean)

plot(names(rating_by_len), rating_by_len, ylim=c(0, 200)
     ,xlab = "Rating", ylab = "Length", main="Average Rating by Movie Length", pch=21)
Add error bars to the plot: mean - sd, mean + sd
sds = tapply(movies$length, movies$rating, sd)
upper = rating_by_len + sds
lower = rating_by_len - sds
segments(x0=as.numeric(names(rating_by_len)), 
         y0=lower, 
         y1=upper)

Hope that helps.

aggers
  • 11
  • 1
-1

I put together start to finish code of a hypothetical experiment with ten measurement replicated three times. Just for fun with the help of other stackoverflowers. Thank you... Obviously loops are an option as applycan be used but I like to see what happens.

#Create fake data
x <-rep(1:10, each =3)
y <- rnorm(30, mean=4,sd=1)

#Loop to get standard deviation from data
sd.y = NULL
for(i in 1:10){
  sd.y[i] <- sd(y[(1+(i-1)*3):(3+(i-1)*3)])
}
sd.y<-rep(sd.y,each = 3)

#Loop to get mean from data
mean.y = NULL
for(i in 1:10){
  mean.y[i] <- mean(y[(1+(i-1)*3):(3+(i-1)*3)])
}
mean.y<-rep(mean.y,each = 3)

#Put together the data to view it so far
data <- cbind(x, y, mean.y, sd.y)

#Make an empty matrix to fill with shrunk data
data.1 = matrix(data = NA, nrow=10, ncol = 4)
colnames(data.1) <- c("X","Y","MEAN","SD")

#Loop to put data into shrunk format
for(i in 1:10){
  data.1[i,] <- data[(1+(i-1)*3),]
}

#Create atomic vectors for arrows
x <- data.1[,1]
mean.exp <- data.1[,3]
sd.exp <- data.1[,4]

#Plot the data
plot(x, mean.exp, ylim = range(c(mean.exp-sd.exp,mean.exp+sd.exp)))
abline(h = 4)
arrows(x, mean.exp-sd.exp, x, mean.exp+sd.exp, length=0.05, angle=90, code=3)
ComputerNoob
  • 480
  • 6
  • 12