7

I'm often using boxplots in my work and like ggplot2 aesthetics. But standard geom_boxplot lacks two things important for me: ends of whiskers and median labels. Thanks to information from here I've written a function:

gBoxplot <- function(formula = NULL, data = NULL, font = "CMU Serif", fsize = 18){
  require(ggplot2)
  vars <- all.vars(formula)
  response <- vars[1]
  factor <- vars[2]
  # A function for medians labelling
  fun_med <- function(x){
    return(data.frame(y = median(x), label = round(median(x), 3)))
  }
  p <- ggplot(data, aes_string(x = factor, y = response)) +
  stat_boxplot(geom = "errorbar", width = 0.6) +
  geom_boxplot() +
  stat_summary(fun.data = fun_med, geom = "label", family = font, size = fsize/3, 
                                                                         vjust = -0.1) +
  theme_grey(base_size = fsize, base_family = font)
  return(p)
}

There are also font settings, but this is just because I'm too lazy to make a theme. Here is an example:

gBoxplot(hwy ~ class, mpg)

plot1

Good enough for me, but there are some restrictictions (cannot use auto-dodging, etc.), and it will be better to make a new geom based on geom_boxplot. I've read the vignette Extending ggplot2, but cannot understand how to implement it. Any help will be appreciated.

Mike Wise
  • 22,131
  • 8
  • 81
  • 104
UlvHare
  • 151
  • 7

1 Answers1

11

So been thinking about this one for a while. Basically when you create a new primitive, you normally write a combination of:

  1. A layer-function
  2. A stat-ggproto,
  3. A geom-ggproto

Only the layer-function need be visible to the user. You only need to write a stat-ggproto if you need some new way of transforming your data to make your primitive. And you only need write a geom-ggproto if you have some new grid-based graphics to create.

In this case, where we are basically composting layer-function that already exist, we don’t really need to write new ggprotos. It is enough to write a new layer-function. This layer-function will create the three layers that you already are using and map the parameters the way you intend. In this case:

  • Layer1 – uses geom_errorbar and stat_boxplot – to get our errorbars
  • Layer2 – uses geom_boxplot and stat_boxplot - to create the boxplots
  • Layer3 – users geom_label and stat_summary - to create the text labels with the mean value in the center of the boxes.

Of course you could write a new stat-ggproto and a new geom-ggproto that do all of these things at once. Or maybe you compost stat_summary and stat_boxplot into one, and the three geom-protos as well, and this do this with one layer. But there is little point unless we have efficiency problems.

Anyway, here is the code:

geom_myboxplot <- function(formula = NULL, data = NULL,
                           stat = "boxplot", position = "dodge",coef=1.5,
                           font = "sans", fsize = 18, width=0.6,
                           fun.data = NULL, fun.y = NULL, fun.ymax = NULL,
                           fun.ymin = NULL, fun.args = list(),
                           outlier.colour = NULL, outlier.color = NULL,
                           outlier.shape = 19, outlier.size = 1.5,outlier.stroke = 0.5,
                           notch = FALSE,  notchwidth = 0.5,varwidth = FALSE,
                           na.rm = FALSE, show.legend = NA,
                           inherit.aes = TRUE,...) {
    vars <- all.vars(formula)
    response <- vars[1]
    factor <- vars[2]
    mymap <- aes_string(x=factor,y=response)
    fun_med <- function(x) {
        return(data.frame(y = median(x), label = round(median(x), 3)))
    }
    position <- position_dodge(width)
    l1 <- layer(data = data, mapping = mymap, stat = StatBoxplot,
            geom = "errorbar", position = position, show.legend = show.legend,
            inherit.aes = inherit.aes, params = list(na.rm = na.rm,
                coef = coef, width = width, ...))
    l2 <- layer(data = data, mapping = mymap, stat = stat, geom = GeomBoxplot,
            position = position, show.legend = show.legend, inherit.aes = inherit.aes,
            params = list(outlier.colour = outlier.colour, outlier.shape = outlier.shape,
                outlier.size = outlier.size, outlier.stroke = outlier.stroke,
                notch = notch, notchwidth = notchwidth, varwidth = varwidth,
                na.rm = na.rm, ...))
    l3 <- layer(data = data, mapping = mymap, stat = StatSummary,
            geom = "label", position = position, show.legend = show.legend,
            inherit.aes = inherit.aes, params = list(fun.data = fun_med,
                fun.y = fun.y, fun.ymax = fun.ymax, fun.ymin = fun.ymin,
                fun.args = fun.args, na.rm=na.rm,family=font,size=fsize/3,vjust=-0.1,...))
    return(list(l1,l2,l3))
}

which allows you to create your customized boxplots it now like this:

ggplot(mpg) +
  geom_myboxplot( hwy ~ class, font = "sans",fsize = 18)+
  theme_grey(base_family = "sans",base_size = 18 )

And they look like this:

enter image description here

Note: we did not actually have to use the layer function, we could have used the orginal stat_boxplot, geom_boxplot, and stat_summary calls in their place. But we still would have had to fill in all the parameters if we wanted to be able to control them from our custom boxplot, so I think it was clearer this way - at least from the point-of-view of structure as opposed to functionality. Maybe it isn't though, it is a matter of taste...

Also I don't have that font which does look a lot nicer. But I did not feel like tracking it down and installing it.

Mike Wise
  • 22,131
  • 8
  • 81
  • 104
  • I've lost hope for any answer, so view yours just now. Really it's the very thing: working example and some theory to think about. Need some time to load the idea in mind (English is not my motherlang). The font I use is [Computer Modern Unicode](http://cm-unicode.sourceforge.net/), a unicode version of D. Knuth's classic. – UlvHare Mar 02 '16 at 13:53
  • Great. Glad you like it. Please accept the answer though. – Mike Wise Mar 02 '16 at 13:54
  • Thanks. If you had two more points you could upvote it too. Maybe next time :) – Mike Wise Mar 22 '16 at 12:41
  • @MikeWise Do you know how I could adapt this approach to adjust positions for e.g. points or polygons? I'm looking to develop a function where overlapping geometric objects are shifted so they don't overlap, probably using a force-direction approach. Any pointers much appreciated – geotheory Feb 11 '17 at 03:38
  • The example above shifts the points to avoid collisions. Are you talking about new gemos that you define, or are you looking to layer other already existing geoms. – Mike Wise Feb 12 '17 at 07:57