0

I am working on a boxplot with points overlayed and lines connecting the points between two time sets, example data provided below.

I have two questions:

  1. I would like the points to look like this, with just a little height jitter and more width jitter. However, I want the points to be symmetrically centered around the middle of the boxplot on each y axis label (to make the plots more visually pleasing). For example, I would like the 6 datapoints at y = 4 and x = "after to be placed 3 to the right of the boxplot center and 3 to the left of the center, at symmetrical distances from the center.

  2. Also, I want the lines to connect with the correct points, but now the lines start and end in the wrong places. I know I can use position = position_dodge() in geom_point() and geom_line() to get the correct positions, but I want to be able to adjust the points by height also (why do the points and lines align with position_dodge() but not with position_jitter?).

Are these to things possible to achieve?

Thank you!

examiner <- rep(1:15, 2)
time <- rep(c("before", "after"), each = 15)
result <- c(1,3,2,3,2,1,2,4,3,2,3,2,1,3,3,3,4,4,5,3,4,3,2,2,3,4,3,4,4,3)
data <- data.frame(examiner, time, result)

ggplot(data, aes(time, result, fill=time)) + 
  geom_boxplot() +
  geom_point(aes(group = examiner), 
             position = position_jitter(width = 0.2, height = 0.03)) +
  geom_line(aes(group = examiner), 
            position = position_jitter(width = 0.2, height = 0.03), alpha = 0.3)
stefan
  • 90,330
  • 6
  • 25
  • 51
viki
  • 15
  • 5

2 Answers2

2

I'm not sure that you can satisfy both of your questions together.

  1. You can have a more "symmetric" jitter by using a geom_dotplot, as per:
ggplot(data, aes(time, result, fill=time)) + 
  geom_boxplot() +
  geom_dotplot(binaxis="y", aes(x=time, y=result, group = time), 
             stackdir = "center", binwidth = 0.075)

The problem is that when you add the lines, they will join at the original, un-jittered points.

  1. To join jittered points with lines that map to the jittered points, the jitter can be added to the data before plotting. As you saw, jittering both ends up with points and lines that don't match. See Connecting grouped points with lines in ggplot for a better explanation.
library(dplyr)

data <- data %>% 
  mutate(result_jit = jitter(result, amount=0.1),
         time_jit = jitter(case_when(
           time == "before" ~ 2,
           time == "after" ~ 1
         ), amount=0.1)
  )
         

ggplot(data, aes(time, result, fill=time)) + 
  geom_boxplot() +
  geom_point(aes(x=time_jit, y=result_jit, group = examiner)) +
  geom_line(aes(x=time_jit, y=result_jit, group = examiner), alpha=0.3)

Result

timothyd
  • 46
  • 3
  • Thank you very much, great answer. Two questions again: 1. I really like the look of the symmetrical dotplots - is there any way to connect the dotplots with lines? That would be amazing, but you seem to say that it is not possible, right? Or can you map the geom_line() data to the dotplot in some way? 2. Interesting, adding jitter to the raw data. is it possible to get these points more symmetric? Or is it just completely random, apart from when you set the seed? – viki Sep 06 '22 at 09:05
  • This, with matching points and lines, would be amazing: `ggplot(data, aes(time, result, fill=time)) + geom_boxplot() + geom_dotplot(binaxis="y", aes(x=time, y=result, group = time), stackdir = "center", binwidth = 0.09) + geom_line(aes(x=time, y=result, group = examiner), position = position_dodge(0.6), alpha = 0.6)` – viki Sep 06 '22 at 09:14
1

It is possible to extract the transformed points from the geom_dotplot using ggplot_build() - see Is it possible to get the transformed plot data? (e.g. coordinates of points in dot plot, density curve)

These points can be merged onto the original data, to be used as the anchor points for the geom_line.

Putting it all together:

library(dplyr)
library(ggplot2)

examiner <- rep(1:15, 2)
time <- rep(c("before", "after"), each = 15)
result <- c(1,3,2,3,2,1,2,4,3,2,3,2,1,3,3,3,4,4,5,3,4,3,2,2,3,4,3,4,4,3)

# Create a numeric version of time
data <- data.frame(examiner, time, result) %>% 
  mutate(group = case_when(
           time == "before" ~ 2,
           time == "after" ~ 1)
  )

# Build a ggplot of the dotplot to extract data
dotpoints <- ggplot(data, aes(time, result, fill=time)) + 
  geom_dotplot(binaxis="y", aes(x=time, y=result, group = time), 
               stackdir = "center", binwidth = 0.075)

# Extract values of the dotplot
dotpoints_dat <- ggplot_build(dotpoints)[["data"]][[1]] %>% 
  mutate(key = row_number(),
         x = as.numeric(x),
         newx = x + 1.2*stackpos*binwidth/2) %>% 
  select(key, x, y, newx)

# Join the extracted values to the original data
data <- arrange(data, group, result) %>% 
  mutate(key = row_number())
newdata <- inner_join(data, dotpoints_dat, by = "key") %>% 
  select(-key)

# Create final plot
ggplot(newdata, aes(time, result, fill=time)) + 
  geom_boxplot() +
  geom_dotplot(binaxis="y", aes(x=time, y=result, group = time), 
               stackdir = "center", binwidth = 0.075) +
  geom_line(aes(x=newx, y=result, group = examiner), alpha=0.3)

Result

timothyd
  • 46
  • 3
  • Wow, thank you so much. That was an extremely thorough and educative answer, and also just what I was hoping for. Interesting with the extracion of location data. I really appreciate it, all the best! – viki Sep 09 '22 at 12:29