How to generate the same plot with "jitter", and how to jitter selected points (not all points)?

Question

What I would like to do is:

a) have the plot produced by the ggplot code be the same each time it runs [set.seed kind of notion?] and

b) have text labels jittered only for labels that have the same y-axis value -- leave the other text labels alone. This would seem to be some kind of conditional jittering based on a factor value for the points.

Here is some data:

dput(df)
structure(list(Firm = c("a verylongname", "b verylongname", "c verylongname", 
"d verylongname", "e verylongname", "f verylongname", "g verylongname", 
"h verylongname", "i verylongname", "j verylongname"), Sum = c(74, 
77, 79, 82, 85, 85, 88, 90, 90, 92)), .Names = c("Firm", "Sum"
), row.names = c(NA, 10L), class = "data.frame")

Here is ggplot code using df:

ggplot(df, aes(x = reorder(Firm, Sum, mean), y = Sum)) +
  geom_text(aes(label = Firm), size = 3, show.guides = FALSE, position = position_jitter(height = .9)) +
  theme(axis.text.x = element_blank()) +
  scale_x_discrete(expand = c(-1.1, 0)) +   # to show the lower left name fully
  labs(x = "", y = "", title = "")

Notice one version of the plot still overlaps h and i -- each time I run the above code the locations of the text labels change.

BTW, this question conditional jitter shifts the discrete values on the x-axis a bit, but I would like to shift the overlapping points (only) on the y-axis.

If your main goal is to avoid overlap, you *might* like this question http://stackoverflow.com/questions/30178954/dynamic-data-point-label-positioning-in-ggmap, and this on stats.SE: http://stats.stackexchange.com/questions/16057/how-do-i-avoid-overlapping-labels-in-an-r-plot/69236#69236 — maj, Sep 11 '15 at 16:19

score 4 · Accepted Answer · edited Aug 19 '16 at 09:04

One option is to add a column to mark overlapping points and then plot those separately. A better option might be to directly shift the y-values of the overlapping points, so that we get direct control over their placement. I show both options below.

Option 1 (jitter): First, add a column to mark overlaps. In this case, because the points pretty much fall on a line, we can mark any points as overlapping if their y-values are too close. You can include more complex conditions if it's important to check whether the x-values are close as well.

df$overlap = lapply(1:nrow(df), function(i) {
  if(min(abs(df[i, "Sum"] - df$Sum[-i])) <= 1) "Overlap" else "Ignore"
})

In the plot, I've colored the jittered points red so it's easy to tell which ones were affected.

# Add set.seed() here to make jitter reproducible
ggplot(df, aes(x = reorder(Firm, Sum, mean))) +
  geom_text(data=df[df$overlap=="Overlap",], 
            aes(label = Firm, y = Sum), size = 3,  
            position = position_jitter(width=0, height = 1), colour="red") +
  geom_text(data=df[df$overlap=="Ignore",], 
            aes(label = Firm, y = Sum), size = 3) +
  theme(axis.text.x = element_blank()) +
  scale_x_discrete(expand = c(-1.1, 0)) +   # to show the lower left name fully
  labs(x = "", y = "", title = "")

Option 2 (direct placement): Another option is to directly control how much the labels are shifted, rather than taking whatever jitter happens to give us. In this case, we know that we want to shift each pair of points with the same y-value. More complex logic would be necessary in cases where we need to worry about both x and y values, more than two points in the same overlap, and/or where we need to shift values that are close, but not exactly the same.

library(dplyr)

# Create a new column that shifts pairs of points with the same y-value by +/- 0.25
df = df %>% group_by(Sum) %>%
  mutate(SumNoOverlap = if(n()>1) Sum + c(-0.25,0.25) else Sum)

ggplot(df, aes(x = reorder(Firm, Sum, mean), y = SumNoOverlap)) +
  geom_text(aes(label = Firm), size = 3) +
  theme(axis.text.x = element_blank()) +
  scale_x_discrete(expand = c(-1.1, 0)) +   # to show the lower left name fully
  labs(x = "", y = "", title = "")

Note: To make jitter reproducible, add set.seed(153) (or whatever seed value you want) before the jittered plot code.

FYI - set.seed(123) does not make the jitter reproducible in my hands using ggplot2 2.2.1. — milo, Aug 01 '18 at 20:09

How to generate the same plot with "jitter", and how to jitter selected points (not all points)?

1 Answers1