3

If a data set includes at least three data points per cell, the dodging of violins works as expected. See the image below.

Dodge is as expeced Dodge is as expeced

However, in the code that follows, there are only 2 data points in the 'Verbal Class B' cell. With just two data points, ggplot2 refuses to construct a violin, which I'm OK with. But as a side effect, the violin for the 'Verbal Class A' condition is horizontally misaligned, causing that violin to also be misaligned with the data points generated by geom_point. See the image, below.

Violin dodge fails and causes misalignment with other dodged elements

Is there a workaround to make the violin dodge properly so as to stay aligned with the data points?

Score = c( 9,12,6,12,11,10,4,12,11,10,9,9,14,8,12,11,10,11,4,10,11,17,6,15,8,12,14,1,16,3,18,16,15,11,10,14,8,8,12,15)
Topic = c( "Math","Math","Math","Math","Math","Math","Math","Math","Math","Math","Math","Math","Math","Math","Math","Math","Math","Math","Math","Math","Verbal","Verbal","Verbal","Verbal","Verbal","Verbal","Verbal","Verbal","Verbal","Verbal","Verbal","Verbal","Verbal","Verbal","Verbal","Verbal","Verbal","Verbal","Verbal","Verbal")
Class = c( "A","A","A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","B","B","B")
#Class = c( "A","A","A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","B","B")
DataSet = data.frame(Topic,Class,Score)
mywidth <- 1.0 
mydodge <- 0.90
myjitteramount <- 0.35
ggplot (data = DataSet, aes(x = Topic, y = Score, color = Class))+
  geom_violin (draw_quantiles = c(0.25, 0.5, 0.75), fill = NA, width = mywidth, position = position_dodge(mydodge), alpha = 1.0, size = 0.47, scale = "area", show.legend = FALSE) + 
  geom_point (position = position_jitterdodge(dodge.width = mydodge, jitter.height = 0, jitter.width = myjitteramount), shape = 21, size = 1.5, stroke = 0.7, fill = NA, alpha = 1.0, show.legend = TRUE)  +  
  ggsave ("TempPlot1.png", width = 11, height = 11, units = "in", dpi = 600)
tjebo
  • 21,977
  • 7
  • 58
  • 94
  • related, but not really helping: https://stackoverflow.com/questions/11020437/consistent-width-for-geom-bar-in-the-event-of-missing-data https://stackoverflow.com/questions/10326729/dont-drop-zero-count-dodged-barplot – tjebo Jan 29 '20 at 09:12
  • I feel that this may not very easily be possible - certainly with some bad hack it would be. What about avoiding geom_violin altogether and using ggbeeswarm::geom_beeswarm instead? You have so few data points that a violin plot is also not really helpful for visualisation. – tjebo Jan 29 '20 at 09:14
  • This is a minimal working example, not my actual data. I actually have many, many data points in all all of the conditions except in one, in which there are only two data points. [RE: You have so few data points that a violin plot is also not really helpful for visualisation] – user1113568 Jan 29 '20 at 14:45
  • Understood - however, `geom_beeswarm` also works with "many many data points". You can always add an `alpha` or so - it will also, nice effect, shape like your violins when many data points. You can then always add a boxplot or so for your quartiles if needed. – tjebo Jan 29 '20 at 16:55

1 Answers1

1

I feel that this may not very easily be possible - certainly with some bad hack it would be.

If you want to keep your dodge, a less-than satisfying workaround is to create the violin plot with a different set of data (giving fake data to the last group), cover it with a rectangle, and overplot with your points.

library(ggplot2)

Score <- c(9, 12, 6, 12, 11, 10, 4, 12, 11, 10, 9, 9, 14, 8, 12, 11, 10, 11, 4, 10, 11, 17, 6, 15, 8, 12, 14, 1, 16, 3, 18, 16, 15, 11, 10, 14, 8, 8, 12, 15)
Topic <- c("Math", "Math", "Math", "Math", "Math", "Math", "Math", "Math", "Math", "Math", "Math", "Math", "Math", "Math", "Math", "Math", "Math", "Math", "Math", "Math", "Verbal", "Verbal", "Verbal", "Verbal", "Verbal", "Verbal", "Verbal", "Verbal", "Verbal", "Verbal", "Verbal", "Verbal", "Verbal", "Verbal", "Verbal", "Verbal", "Verbal", "Verbal", "Verbal", "Verbal")
Class1 <- c( "A","A","A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","B","B","B")
Class2 <- c( "A","A","A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","B","B")

DataSet1 <- data.frame(Topic, Class1, Score)
DataSet2 <- data.frame(Topic, Class2, Score)

ggplot() +
  geom_violin(data = DataSet1, aes(x = Topic, y = Score, color = Class1), draw_quantiles = c(0.25, 0.5, 0.75),  position = position_dodge()) +
  annotate(geom = 'rect', xmin = 2, xmax = Inf, ymin = -Inf, ymax = Inf, fill = 'white') +
  geom_point(data = DataSet2, aes(x = Topic, y = Score, color = Class2), position = position_jitterdodge())

enter image description here

A better option is probably to separate your data using facet. You can only really facet by class, which may make the comparison difficult, but at least the data points overlap the violins:

ggplot(data = DataSet2, aes(x = Topic, y = Score, color = Class2)) +
  geom_violin(draw_quantiles = c(0.25, 0.5, 0.75), position = position_dodge()) +
    geom_point(position = position_jitterdodge()) +
  facet_grid(~Class2, scales = 'free_x') 

enter image description here

Another option would be to reconsider your visualisation, e.g. using ggbeeswarm.

library(ggbeeswarm)
ggplot(DataSet2, aes(x = Topic, y = Score, color = Class2)) +
  geom_beeswarm(dodge.width = 0.5) 

enter image description here

tjebo
  • 21,977
  • 7
  • 58
  • 94
  • Thanks. I feel that the **_"create the violin plot with a different set of data (giving fake data to the last group), cover it with a rectangle, and overplot with your points"_** is the best example, so far, of a workaround (though it is awkward, and far from ideal). – user1113568 Jan 30 '20 at 16:22
  • @user1113568 I agree it's far from ideal. I'd probaly use `geom_beeswarm`, but if you really want a good solution with `geom_violin`, this may indeed warrant a feature request on github. - although I doubt this will have high priority for the developers – tjebo Jan 30 '20 at 16:28
  • @user1113568 to make the awkward solution look better, you could remove the grid lines and use the same fill as the panel background – tjebo Jan 30 '20 at 16:30