1

I've created these split half violin plots using ggplot. However, instead of including the boxplot, which shows the median, I'd like to include a horizontal line with the mean. This means each colored half would have its own mean line: the gold half would have a mean line which would not exactly align with the mean line on the grey half. Importantly, I'd like the mean line to reside only inside the density plot. How can I achieve this? I can't figure it out and I'd appreciate any help!

Here's some example data:

set.seed(20160229)

my_data = data.frame(
  y=c(rnorm(1000), rnorm(1000, 0.5), rnorm(1000, 1), rnorm(1000, 
1.5)),
  x=c(rep('a', 2000), rep('b', 2000)),
  m=c(rep('i', 1000), rep('j', 2000), rep('i', 1000))
)

Here's the extension for geom_violin to create split_geom_violin:

GeomSplitViolin <- ggproto("GeomSplitViolin", GeomViolin, draw_group = function(self, data, ..., draw_quantiles = NULL){
  data <- transform(data, xminv = x - violinwidth * (x - xmin), xmaxv = x + violinwidth * (xmax - x))
  grp <- data[1,'group']
  newdata <- plyr::arrange(transform(data, x = if(grp%%2==1) xminv else xmaxv), if(grp%%2==1) y else -y)
  newdata <- rbind(newdata[1, ], newdata, newdata[nrow(newdata), ], newdata[1, ])
  newdata[c(1,nrow(newdata)-1,nrow(newdata)), 'x'] <- round(newdata[1, 'x']) 
  if (length(draw_quantiles) > 0 & !scales::zero_range(range(data$y))) {
    stopifnot(all(draw_quantiles >= 0), all(draw_quantiles <= 
                                              1))
    quantiles <- ggplot2:::create_quantile_segment_frame(data, draw_quantiles)
    aesthetics <- data[rep(1, nrow(quantiles)), setdiff(names(data), c("x", "y")), drop = FALSE]
    aesthetics$alpha <- rep(1, nrow(quantiles))
    both <- cbind(quantiles, aesthetics)
    quantile_grob <- GeomPath$draw_panel(both, ...)
    ggplot2:::ggname("geom_split_violin", grid::grobTree(GeomPolygon$draw_panel(newdata, ...), quantile_grob))
  }
  else {
    ggplot2:::ggname("geom_split_violin", GeomPolygon$draw_panel(newdata, ...))
  }
})

geom_split_violin <- function (mapping = NULL, data = NULL, stat = "ydensity", position = "identity", ..., draw_quantiles = NULL, trim = TRUE, scale = "area", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE) {
  layer(data = data, mapping = mapping, stat = stat, geom = GeomSplitViolin, position = position, show.legend = show.legend, inherit.aes = inherit.aes, params = list(trim = trim, scale = scale, draw_quantiles = draw_quantiles, na.rm = na.rm, ...))
}

Here's the code for the graph:

library(ggplot2)
ggplot(my_data, aes(x, y, fill=m)) + 
  geom_split_violin(trim = TRUE) + 
  geom_boxplot(width = 0.25, notch = FALSE, notchwidth = .4, outlier.shape = NA, coef=0) +
  labs(x=NULL,y="GM Attitude Score") +
  theme_classic() +
  theme(text = element_text(size = 20)) +
  scale_x_discrete(labels=c("0" = "Control\nCondition", "1" = "GM\nCondition")) +
  scale_fill_manual(values=c("#E69F00", "#999999"), 
                    name="Survey\nPart",
                    breaks=c("1", "2"),
                    labels=c("Time 1", "Time 5"))

enter image description here

  • Please [edit](https://stackoverflow.com/posts/51228076/edit) your question to include sample data. – Maurits Evers Jul 08 '18 at 01:17
  • Please share sample of your data using `dput()` (not `str` or `head` or picture/screenshot) so others can help. See more here https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example?rq=1 – Tung Jul 08 '18 at 01:39
  • Ok, I think I've added example data. At least it works on my console... – socialresearcher Jul 08 '18 at 01:52
  • Do you want a boxplot that uses mean instead of median? Or do you want just a mean line? – Gregor Thomas Jul 08 '18 at 03:46
  • 2
    Also, I can't find `geom_split_violin` in ggplot2 or any other CRAN package. Where is it from? – Gregor Thomas Jul 08 '18 at 03:53
  • @Gregor: I believe it's a custom-built function similar to this https://stackoverflow.com/q/47651868/786542 – Tung Jul 08 '18 at 04:02
  • If you're using a function you wrote yourself or that's outside of standard packages, please include it in the post. Otherwise, how would we run it? – camille Jul 08 '18 at 12:59
  • But also, it's good when you can boil your question down to its essence. Is the question really about violin plots, or can you narrow it down to just being "How can I make line segments at each mean"? – camille Jul 08 '18 at 13:01
  • You can make the split violin plot by following the instructions here: https://stackoverflow.com/questions/35717353/split-violin-plot-with-ggplot2. But I really am just concerned about the mean lines, which you should be able to do in a regular density plot or a regular violin plot. I've added the code above to create the split plots. – socialresearcher Jul 08 '18 at 13:41

1 Answers1

4

You can use stat_summary & geom_crossbar while setting all fun.y, fun.ymin & fun.ymax to mean only

library(ggplot2)

ggplot(my_data, aes(x, y, fill = m)) +
  geom_split_violin(trim = TRUE) +
  stat_summary(fun.y = mean, fun.ymin = mean, fun.ymax = mean,
               geom = "crossbar", 
               width = 0.25,
               position = position_dodge(width = .25),
  ) +
  labs(x = NULL, y = "GM Attitude Score") +
  theme_classic() +
  theme(text = element_text(size = 20)) +
  scale_x_discrete(labels = c("0" = "Control\nCondition", "1" = "GM\nCondition")) +
  scale_fill_manual(
    values = c("#E69F00", "#999999"),
    name = "Survey\nPart",
    breaks = c("1", "2"),
    labels = c("Time 1", "Time 5")
  )

Data & function used:

set.seed(20160229)

my_data <- data.frame(
  y = c(rnorm(1000), rnorm(1000, 0.5), rnorm(1000, 1), rnorm(1000, 1.5)),
  x = c(rep("a", 2000), rep("b", 2000)),
  m = c(rep("i", 1000), rep("j", 2000), rep("i", 1000))
)

GeomSplitViolin <- ggproto(
  "GeomSplitViolin",
  GeomViolin,
  draw_group = function(self, data, ..., draw_quantiles = NULL) {
    data <- transform(data,
                      xminv = x - violinwidth * (x - xmin),
                      xmaxv = x + violinwidth * (xmax - x)
    )
    grp <- data[1, "group"]
    newdata <- plyr::arrange(
      transform(data, x = if (grp %% 2 == 1) xminv else xmaxv),
      if (grp %% 2 == 1) y else -y
    )
    newdata <- rbind(newdata[1, ], newdata, newdata[nrow(newdata), ], newdata[1, ])
    newdata[c(1, nrow(newdata) - 1, nrow(newdata)), "x"] <- round(newdata[1, "x"])
    if (length(draw_quantiles) > 0 & !scales::zero_range(range(data$y))) {
      stopifnot(all(draw_quantiles >= 0), all(draw_quantiles <= 1))
      quantiles <- ggplot2:::create_quantile_segment_frame(data, draw_quantiles)
      aesthetics <- data[rep(1, nrow(quantiles)), setdiff(names(data), c("x", "y")), drop = FALSE]
      aesthetics$alpha <- rep(1, nrow(quantiles))
      both <- cbind(quantiles, aesthetics)
      quantile_grob <- GeomPath$draw_panel(both, ...)
      ggplot2:::ggname(
        "geom_split_violin",
        grid::grobTree(GeomPolygon$draw_panel(newdata, ...), quantile_grob)
      )
    } else {
      ggplot2:::ggname("geom_split_violin", GeomPolygon$draw_panel(newdata, ...))
    }
  }
)

geom_split_violin <- function(mapping = NULL,
                              data = NULL,
                              stat = "ydensity",
                              position = "identity", ...,
                              draw_quantiles = NULL,
                              trim = TRUE,
                              scale = "area",
                              na.rm = FALSE,
                              show.legend = NA,
                              inherit.aes = TRUE) {
  layer(
    data = data,
    mapping = mapping,
    stat = stat,
    geom = GeomSplitViolin,
    position = position,
    show.legend = show.legend,
    inherit.aes = inherit.aes,
    params = list(
      trim = trim,
      scale = scale,
      draw_quantiles = draw_quantiles,
      na.rm = na.rm, ...
    )
  )
}

Created on 2018-07-08 by the reprex package (v0.2.0.9000).

Tung
  • 26,371
  • 7
  • 91
  • 115
  • So it looks like its the "width = 0.25" that limits the size of the crossbar. If I make the width larger, it crosses over into the other side of the density plot. But I'd like to generate a line that takes up the whole respective side of the density plot and does not cross over the center line. – socialresearcher Jul 08 '18 at 21:40
  • Here's an example of what I'm looking for: http://mbjoseph.github.io/2013/06/24/violin.html But I can't get vioplot, vioplot2, or vioplotx to work at all. I'd really be happy if someone can help me out! – socialresearcher Jul 08 '18 at 22:58