2

How can I generate a ggplot2 scatterplot of two groups with the means indicated together with X and Y error bars, like this?

Scatter plot

Here is a reduced example (using dput to recreate the data.frame df) with two groups of cells and three measures, and I'd like to say plot Peak against Rise, or Peak against Decay. That much is straightforward, but I would like to add points indicating the group means with X and Y error bars (+/- sem).

Is there a way to do this within ggplot2, or do I need to generate means and sem values first? This post draw my attention to geom_errorbarh but I'm still uncertain as to the best way to proceed.

library(ggplot2)

df<-structure(list(Group = c("A", "A", "A", "A", "A", "A", "A", 
"A", "B", "B", "B", "B", "B", "B", "B", "B"), Peak = c(102.975, 
37.805, 64.996, 66.36, 199.354, 7.425, 34.137, 366.59, 10.165, 
14.833, 702.525, 39.086, 8.286, 122.783, 105.762, 37.018), Rise = c(0.346855, 
0.24165, 0.24028, 0.461548, 0.194016, 0.164047, 0.484375, 0.307861, 
0.438538, 0.488083, 0.549423, 0.365448, 0.511551, 0.33596, 0.331467, 
0.270096), Decay = c(1.3874, 1.07407, 1.88787, 2.64408, 1.1462, 
0.615963, 4.04641, 1.48701, 3.61397, 4.1838, 1.92746, 3.64329, 
4.21354, 0.812695, 1.14611, 1.28279)), .Names = c("Group", 
"Peak", "Rise", "Decay"), class = "data.frame", row.names = c(NA, 
-16L))

ggplot(df, aes(Peak, Rise)) + 
  geom_point(aes(colour=Group)) +
  theme_bw(14)

I have tried something like:

library(doBy)

sem <- function(x) sqrt(var(x)/length(x))
z<-summaryBy(Peak+Rise+Decay~Group, data=df, FUN=c(mean,sem))
z

to get the values, but easily (and flexibly) incorporating them into the ggplot code is defeating me.

Community
  • 1
  • 1
user441706
  • 1,370
  • 2
  • 16
  • 17

1 Answers1

3

I tend to use plyr for these kinds of summaries:

z <- ddply(df,.(Group),summarise,
            Peak = mean(Peak),
            Rise = mean(Rise),
            PeakSE = sqrt(var(Peak))/length(Peak),
            RiseSE = sqrt(var(Rise))/length(Rise))

ggplot(df,aes(x = Peak,y = Rise)) + 
    geom_point(aes(colour = Group)) + 
    geom_point(data = z,aes(colour = Group)) +
    geom_errorbarh(data = z,aes(xmin = Peak - PeakSE,xmax = Peak + PeakSE,y = Rise,colour = Group,height = 0.01)) + 
    geom_errorbar(data = z,aes(ymin = Rise - RiseSE,ymax = Rise + RiseSE,x = Peak,colour = Group))

enter image description here

I confess I was a little disappointed that I had to manually tweak the crossbar height. But thinking about it, I guess that could be fairly challenging to implement.

joran
  • 169,992
  • 32
  • 429
  • 468
  • Excellent, thanks. I am now wondering how to generalise this such that I name the variables only once and make it easy to apply to different pairs of measures... – user441706 Sep 25 '12 at 08:44
  • 2
    If I don't need error bars, is it possible to plot the means without calculating them ahead of time (with ddply or whatnot)? I see that it's possible to use `stat_summary` to do so to plot ymeans, but I can't figure out how to use that feature to simultaneously calculate xmeans. (I'm not sure if this is the right venue to ask this question - if it's not, I can create a new post.) – fredtal Nov 27 '13 at 19:51
  • @TaliaYoung Probably with `stat_summary` and `fun.data`, where your function would collapse both the x and y variables. But there's really nothing wrong with summarising outside of ggplot. – joran Nov 27 '13 at 20:00
  • Okay, thank you! And yes, before `ggplot` it hadn't even occurred to me that such a thing might be possible without calculating it beforehand! I'm getting spoiled. :) – fredtal Nov 27 '13 at 20:24
  • 1
    It is probably better to re-assign Peak and Rise after you calculate the standard error. On my machine, reassigning early results in NA. – sautedman Feb 08 '16 at 18:43