3

With ggplot2, I can create a violin plot with overlapping points, and paired points can be connected using geom_line().

library(datasets)
library(ggplot2)
library(dplyr)

iris_edit <- iris %>% group_by(Species) %>%
  mutate(paired = seq(1:length(Species))) %>%
  filter(Species %in% c("setosa","versicolor"))

ggplot(data = iris_edit,
       mapping = aes(x = Species, y = Sepal.Length, fill = Species)) +
  geom_violin() +
  geom_line(mapping = aes(group = paired),
            position = position_dodge(0.1),
            alpha = 0.3) +
  geom_point(mapping = aes(fill = Species, group = paired),
             size = 1.5, shape = 21,
             position = position_dodge(0.1)) +
  theme_classic() +
  theme(legend.position = "none",
        axis.text.x = element_text(size = 15),
        axis.title.y = element_text(size = 15),
        axis.title.x = element_blank(),
        axis.text.y = element_text(size = 10))

violin plot of iris data

The see package includes the geom_violindot() function to plot a halved violin plot alongside its constituent points. I've found this function helpful when plotting a large number of points so that the violin is not obscured.

library(see)

ggplot(data = iris_edit,
       mapping = aes(x = Species, y = Sepal.Length, fill = Species)) +
  geom_violindot(dots_size = 0.8,
                 position_dots = position_dodge(0.1)) +
  theme_classic() +
  theme(legend.position = "none",
        axis.text.x = element_text(size = 15),
        axis.title.y = element_text(size = 15),
        axis.title.x = element_blank(),
        axis.text.y = element_text(size = 10))

violindot plot of iris data

Now, I would like to add geom_line() to geom_violindot() in order to connect paired points, as in the first image. Ideally, I would like the points to be inside and the violins to be outside so that the lines do not intersect the violins. geom_violindot() includes the flip argument, which takes a numeric vector specifying the geoms to be flipped.

ggplot(data = iris_edit,
       mapping = aes(x = Species, y = Sepal.Length, fill = Species)) +
  geom_violindot(dots_size = 0.8,
                 position_dots = position_dodge(0.1),
                 flip = c(1)) +
  geom_line(mapping = aes(group = paired),
            alpha = 0.3,
            position = position_dodge(0.1)) +
  theme_classic() +
  theme(legend.position = "none",
        axis.text.x = element_text(size = 15),
        axis.title.y = element_text(size = 15),
        axis.title.x = element_blank(),
        axis.text.y = element_text(size = 10))

violindot plot with lines

As you can see, invoking flip inverts the violin half, but not the corresponding points. The see documentation does not seem to address this.

Questions

  1. How can you create a geom_violindot() plot with paired points, such that the points and the lines connecting them are "sandwiched" in between the violin halves? I suspect there is a solution that uses David Robinson's GeomFlatViolin function, though I haven't been able to figure it out.
  2. In the last figure, note that the lines are askew relative to the points they connect. What position adjustment function should be supplied to the position_dots and position arguments so that the points and lines are properly aligned?
acvill
  • 395
  • 7
  • 15
  • 2
    although this is not the answer that you wanna hear, this might be something worth to consider. Don’t pursue this idea of your visualisation. it’s confusing, convoluted, and the story is not well represented. your trying to combine paired observations and estimated distributions of your data. there are other options. in your example: show the paired data in a scatter plot (eachspecies on its own continuous axis), for the estimated distribution show for example iso contour lines (e. g. stat_density_2d) – tjebo Dec 17 '21 at 20:18
  • @tjebo thanks for your comment, I can appreciate that there are likely better ways to represent this type of data. If you want to write an answer explaining your approach as a frame challenge, I may accept it in the absence of other answers – acvill Dec 17 '21 at 20:30
  • @acvill: what is the goal of the viz? – Tung Dec 18 '21 at 00:09
  • @tjebo I have hundreds of short genomic features of a specific type. I have transcriptomics data for these features for two treatments. I want to show the relative change in RPKM for each feature between treatments. I also want to show that there is a change in the mean RPKM between treatments for this feature type, generally. I know the classic viz for this case is a volcano plot, but I want to show RPKM and not fold change / p value. – acvill Dec 18 '21 at 00:33

2 Answers2

2

Not sure about using geom_violindot with see package. But you could use a combo of geom_half_violon and geom_half_dotplot with gghalves package and subsetting the data to specify the orientation:

library(gghalves)

 ggplot(data = iris_edit[iris_edit$Species == "setosa",],
           mapping = aes(x = Species, y = Sepal.Length, fill = Species)) +
   geom_half_violin(side = "l") +
    geom_half_dotplot(stackdir = "up") +
    geom_half_violin(data = iris_edit[iris_edit$Species == "versicolor",],
                     aes(x = Species, y = Sepal.Length, fill = Species), side = "r")+
    geom_half_dotplot(data = iris_edit[iris_edit$Species == "versicolor",],
                      aes(x = Species, y = Sepal.Length, fill = Species),stackdir = "down") +
    geom_line(data = iris_edit, mapping = aes(group = paired),
              alpha = 0.3)

As a note, the lines in the pairing won't properly align because the dotplot is binning each observation then lengthing out the dotline-- the paired lines only correspond to x-value as defined in aes, not where the dot is in the line.

Jonni
  • 804
  • 5
  • 16
  • This solves my main issue, thank you. I was hoping the process of “lengthening out the dotline” could also be applied to the connecting lines, but it seems that may not be so easy. – acvill Dec 18 '21 at 13:42
1

As per comment - this is not a direct answer to your question, but I believe that you might not get the most convincing visualisation when using the "slope graph" optic. This becomes quickly convoluted (so many dots/ lines overlapping) and the message gets lost.

To show change between paired observations (treatment 1 versus treatment 2), you can also (and I think: better) use a scatter plot. You can show each observation and the change becomes immediately clear. To make it more intuitive, you can add a line of equality.

I don't think you need to show the estimated distribution (left plot), but if you want to show this, you could make use of a two-dimensional density estimation, with geom_density2d (right plot)

library(tidyverse)
## patchwork only for demo purpose
library(patchwork)

iris_edit <- iris %>% group_by(Species) %>%
  ## use seq_along instead
  mutate(paired = seq_along(Species)) %>%
  filter(Species %in% c("setosa","versicolor")) %>%
## some more modificiations
  select(paired, Species, Sepal.Length) %>%
  pivot_wider(names_from = Species, values_from = Sepal.Length)

lims <- c(0, 10)

p1 <- 
  ggplot(data = iris_edit, aes(setosa, versicolor)) +
  geom_abline(intercept = 0, slope = 1, lty = 2) +
  geom_point(alpha = .7, stroke = 0, size = 2) +
  cowplot::theme_minimal_grid() +
  coord_equal(xlim = lims, ylim = lims) +
  labs(x = "Treatment 1", y = "Treatment 2")

p2 <- 
  ggplot(data = iris_edit, aes(setosa, versicolor)) +
  geom_abline(intercept = 0, slope = 1, lty = 2) +
  geom_density2d(color = "Grey") +
  geom_point(alpha = .7, stroke = 0, size = 2) +
  cowplot::theme_minimal_grid() +
  coord_equal(xlim = lims, ylim = lims) +
  labs(x = "Treatment 1", y = "Treatment 2")

p1+ p2

Created on 2021-12-18 by the reprex package (v2.0.1)

tjebo
  • 21,977
  • 7
  • 58
  • 94
  • 1
    Thank you for this option -- mapping a single point in two dimensions is certainly cleaner than mapping two linked points in one dimension. And thanks for the `seq_along()` tip! – acvill Dec 18 '21 at 14:50