Vertical equivalent of position_dodge for geom_point on categorical scale

Question

I would like to dodge overlapping geom_point's vertically, when I have a categorical y variable.

library(tidyverse)
# all possible points
df <- expand.grid(
  y_factor = paste0('factor_',1:5),
  x =1:100
)%>%as.tbl

# randomly missing and overlapping points
# every green point has a pink point underneath, and every blue point 
# has a green point underneath it.
seed<-1
df_with_overlap<-df%>%
  sample_frac(0.5,replace = TRUE)%>%
  group_by(y_factor,x)%>%
  mutate(n=factor(1:n()))
p<-ggplot(data=df_with_overlap, aes(x=x, y=y_factor, col=n))
p+geom_point()

Dodging horizontally using position_dodge doesn't work because the data is too crowded on that axis, so some points still overlap and the visualization isn't clear.

p+geom_point(position=position_dodge(width=1))+
  ggtitle('position_dodge isnt what Im looking for. 
          \nx-axis too crowded and points still overlap')

position_jitter kind of works because I can limit x jitter to 0, and control the degree of y jitter. But the randomness of the jitter makes it less appealing. I can kind of make out the 3 colours when they exist.

p+geom_point(aes(col=n), position=position_jitter(width=0, height=0.05))+
  ggtitle('Jitter kind of works.
          \nIt would work better if it wasnt random
          \nlike position_dodge, but vertical dodging')

Is there a way to dodge the points vertically?

Maybe `coord_flip`? (Also maybe transparency, via `alpha = fraction`?) — Rui Barradas, Sep 14 '18 at 19:28
Package ggstance has vertical dodging via `position_dodgev()`. — aosmith, Sep 14 '18 at 19:31
@RuiBarradas `alpha=fraction` doesn't work so well when the points are perfectly coincident, small and more than just 2 or 3 overlapping points. `coord_flip` changes the plot too much. — dule arnaux, Sep 14 '18 at 19:47

score 20 · Accepted Answer · answered Sep 17 '18 at 15:32

Thanks to @aosmith for suggesting ggstance::position_dodgev(). It's exactly what I was looking for. I increased the oversampling so the effect is more obvious.

df <- expand.grid(
  y_factor = paste0('factor_',1:5),
  x =1:100
)%>%as.tbl

seed<-1
df_with_overlap<-df%>%
  sample_frac(1.5,replace = TRUE)%>%
  group_by(y_factor,x)%>%
  mutate(n=factor(1:n()))

ggplot(data=df_with_overlap, aes(x=x, y=y_factor, col=n))+
  geom_point(position=ggstance::position_dodgev(height=0.3))

score 1 · Answer 2 · answered Sep 14 '18 at 19:37

I would transform y_factor to numeric and use continuous y-axis. Trick is to add to "noise" y numeric values by n group.

df_with_overlap <- df_with_overlap %>%
    # Transform y factors to numbers
    mutate(y_num = as.numeric(y_factor)) %>%
    # Add scaling factor by n group 
    mutate(y_num = y_num + case_when(n == 1 ~  0,
                                     n == 2 ~ -0.1,
                                     n == 3 ~  0.1))

# Plot y numeric values
ggplot(df_with_overlap, aes(x, y_num, color = n)) + 
    geom_point() +
    # On y-axis put original labels and no one will notice that it's actually a continuous scale
    scale_y_continuous(breaks = 1:5, 
                       labels = levels(df_with_overlap$y_factor)) +
    labs(y = "y_factor")

I'd prefer to not have to alter the data.frame if possible. Also would prefer a solution that generalizes to an unknown number of `n`. — dule arnaux, Sep 17 '18 at 15:10

Vertical equivalent of position_dodge for geom_point on categorical scale

2 Answers2

Linked