19

Using position_jitter creates random jitter to prevent overplotting of data points.

In the below I have used the example of baseball statistics to illustrate my problem. When I plot the same data with two layers, the same jitter call jitters the geoms a bit differently. This makes sense because it presumably generates the random jitter independently in the two calls, but yields the problem you can see in my graph below.

p=ggplot(baseball,aes(x=round(year,-1),y=sb,color=factor(lg))) 
p=p+stat_summary(fun.data="mean_cl_normal",position=position_jitter(width=3,height=0))+coord_cartesian(ylim=c(0,40))
p+stat_summary(fun.y=mean,geom="line",position=position_jitter(width=3,height=0))

Although the error bar points and the line refer to same data, they are disjointed—the lines and points do not connect.

Is there a work-around for this? I thought position dodge might be the answer but it doesn't seem to work with these kinds of plots. Alternatively, maybe there's some way to get the mean_cl_normal call to also add the lines? alt text

Glorfindel
  • 21,988
  • 13
  • 81
  • 109
Alex Holcombe
  • 2,453
  • 4
  • 24
  • 34

4 Answers4

9

This is a weakness in the current ggplot2 syntax - there's no way to work around it except to add the jitter yourself.

Or you could do something like this:

ggplot(baseball, aes(round(year,-1) + as.numeric(factor(lg)), sb, color = factor(lg))) +
  stat_summary(fun.data="mean_cl_normal") +
  stat_summary(fun.y=mean,geom="line") +
  coord_cartesian(ylim=c(0,40))
hadley
  • 102,019
  • 32
  • 183
  • 245
8

I think so, by setting the seed to be the same in the two instances:

p=ggplot(baseball,aes(x=round(year,-1),y=sb,color=factor(lg)))
myseed = 2010
set.seed(myseed)
p=p+stat_summary(fun.data="mean_cl_normal",
  position=position_jitter(width=3,height=0))+coord_cartesian(ylim=c(0,40))
set.seed(myseed)
p+stat_summary(fun.y=mean,geom="line",
           position=position_jitter(width=3,height=0))

This ensures that the random number generator is sent back to the same starting position as was used in the initial call. However I don't know how you could extract the random increments added to the values.

nullglob
  • 6,903
  • 1
  • 29
  • 31
  • 1
    Good idea, but it didn't work! I thought it would work, because looks like position_jitter uses the base package's jitter, which I expected would be using the same random number generator seeded by set.seed. I suppose a general workaround would be to create my own jittered version of x, but hopefully there's a better way. – Alex Holcombe Jul 02 '10 at 12:31
  • 2
    That won't work because the jittering is done at plot time, not at creation time. – hadley Jul 02 '10 at 19:29
  • 1
    this worked perfectly for me. Maybe something about a new version since hadley commented (4 years ago). This should be the new answer as far as I'm concerned. – rcorty Nov 03 '14 at 04:21
  • 1
    Unfortunately, I can't seem to make this work for me. Points are all over the place in relation to their bars. Perhaps check the answer [here](https://stackoverflow.com/questions/44595920/align-points-and-error-bars-in-ggplot-when-using-jitterdodge?rq=1) – EcologyTom Apr 09 '18 at 07:46
  • This does not work. One of the answers below: https://stackoverflow.com/a/69774958/6461462 actually makes this easier with the newer version of ggplot2. – M-- Mar 08 '23 at 19:15
2

This is easy in newer versions of ggplot2. Most (all?) geoms accept a position argument, which according the docs

Position adjustment, either as a string, or the result of a call to a position adjustment function.

if you call position_jitter you'll see it returns a ggproto object with a consistent seed.

> position_jitter()
<ggproto object: Class PositionJitter, Position, gg>
    compute_layer: function
    compute_panel: function
    height: NULL
    required_aes: x y
    seed: 1279634412
    setup_data: function
    setup_params: function
    width: NULL
    super:  <ggproto object: Class PositionJitter, Position, gg>

so to jitter multiple geoms the same way you can make one of these objects and pass it to multiple geoms like so

data(mtcars)

jitterer <- position_jitter(width = .5) #comically large jitter
mtcars %>%
  ggplot(aes(x = weight, y = hp, ymin = hp, ymax = hp + 5)) +
  geom_point(position = jitterer) +
  geom_linerange(position = jitterer) #try removing position = jitterer and enjoy the show
C. Hammill
  • 312
  • 1
  • 12
  • This is a great approach! On my system, I need to set the seed in `position_jitter()` with `seed = 123`. Also `weight` in `aes()` needs to be `wt` to work with mtcars. – blongworth Nov 28 '21 at 02:34
1

I ended up generating a uniform distribution to solve this problem.

I had to address the same underlying problem today. I create one plot, jittering the points, and then I create a second plot that essentially zooms in on a subsection of the first. It's dissonant and distracting if the points move around.

Following is a demo of the problem and my solution. I don't use ggplot for this plot, but the same concept applies. I make a uniform distribution, one value for each value I need to jitter. I add it to the source dataframe so that each time I take a subset, the jitter value corresponds to the same original data value.

data(airquality)
someDataset= airquality 
someDataset$color="black"
someDataset$color[someDataset$Month==8 & someDataset$Wind==9.7]="red"
## jitter gives different results each time it's run
for (fZoom in c(TRUE, FALSE)){
    if (fZoom) myAirQuality = someDataset[someDataset $Wind >7.5 & someDataset $Wind < 11.5,] 
    else myAirQuality = someDataset[someDataset $Wind >8.5 & someDataset $Wind < 10.5,]
    quartz("Using Jitter")
    plot(myAirQuality $Wind ~ jitter(myAirQuality $Month), col= myAirQuality$color)
    }

someDataset$MonthJit=runif(nrow(someDataset), min=-0.2, max=0.2)
for (fZoom in c(TRUE, FALSE)){
    if (fZoom) myAirQuality = someDataset[someDataset $Wind >7.5 & someDataset $Wind < 11.5,] 
    else myAirQuality = someDataset[someDataset $Wind >8.5 & someDataset $Wind < 10.5,]
    quartz("Using runif")
    plot(myAirQuality $Wind ~ c(myAirQuality $Month + myAirQuality $MonthJit), col= myAirQuality$color)
    }
rhileighalmgren
  • 1,186
  • 1
  • 8
  • 4