3

Given data like this:

dr <- data.frame(
  X = sample(c("yes","no"),10, replace=T),
  Y = rnorm(1000),
  highlight = sample(c(1,NA),1000,replace=T,prob=c(5,995))
)

I want to create two plots. First, a simple one using geom_jitter() to avoid overplotting:

myseed=101
set.seed(myseed)
p <- ggplot(dr, aes(x=X,y=Y,colour=Y,na.rm=T)) +
  theme_bw() +
  geom_jitter(alpha=0.7,width=0.5,na.rm=T) +
  scale_colour_gradient("Y", low="#5edcff", high="#035280") +
  stat_summary(fun.y = "mean", fun.ymin = "mean", fun.ymax= "mean", size=0.3,width=0.33, geom = "crossbar")
plot(p)

Plot 1

Then in a second plot I would like to highlight 5 specific data points. I though I could use set.seed to make ggplot jitter the points in the same way, then add another layer with only the points to be highlighted. Not so: the red points are jittered anew, with the effect that they appear in different locations.

set.seed(myseed)
pm <- p + geom_jitter(colour="red",data=filter(dr, highlight == 1),width=0.5,size=2) 
plot(pm)

Plot 2

What I want: to have Plot 2 be exactly like Plot 1, with the only difference that 5 points (identified in the data frame) are highlighted in red.

I thought maybe the problem is that I'm adding the highlighted points in a separate plot. What if I try to map the "highlight" factor from the start?

set.seed(myseed)
p <- ggplot(dr, aes(x=X,y=Y,colour=Y,fill=highlight)) +
  theme_bw() +
  geom_jitter(alpha=0.7,width=0.5) +
  stat_summary(fun.y = "mean", fun.ymin = "mean", fun.ymax= "mean",     size=0.3,width=0.33, geom = "crossbar")
plot(p)

Plot 3

That doesn't seem to work either. I'm probably overlooking something trivial, or starting out on the wrong foot, but I can't get it to work.

P.S. I've looked at prior questions such as this, this and this, but none answers my question directly.

Community
  • 1
  • 1
strangeloop
  • 193
  • 2
  • 12

2 Answers2

2

How about plotting points separately? Basically, you won't be able to recover positioning after jittering, at least without significant effort. So instead use

set.seed(333)
dr <- data.frame(
  X = sample(c("yes","no"),10, replace=T),
  Y = rnorm(1000),
  highlight = sample(c(1,NA),1000,replace=T,prob=c(5,995))
)
ind <- is.na(dr$highlight)

ggplot(dr, aes(x=X, y=Y, colour=Y)) +
  geom_jitter(data=dr[ind, ], alpha=0.7, width=0.5) +
  geom_jitter(data=dr[!ind, ], width=0.5, colour = "red", size=3) +
  stat_summary(fun.y = "mean", fun.ymin = "mean", fun.ymax= "mean", 
               size=0.3,width=0.33, geom = "crossbar") + 
  theme_bw()

enter image description here

tonytonov
  • 25,060
  • 16
  • 82
  • 98
1

As compared to your solution of layering two jitters, the fill approach was in the right direction. However, fill works only for shapes 21-25, so you were not able to see the desired result.

Graph with all points:

myseed=101
set.seed(myseed)
p <- ggplot(dr, aes(x=X,y=Y,colour=Y)) +
  theme_bw() +
  geom_jitter(alpha=0.7,width=0.5, size = 3) +
  scale_colour_gradient("Y", low="#5edcff", high="#035280") +
  stat_summary(fun.y = "mean", fun.ymin = "mean", fun.ymax= "mean", size=0.3,width=0.33, geom = "crossbar")
plot(p)

enter image description here

Graph with highlighted points:

Note that I supplied aesthetics to stat_summary again, otherwise it would generate another summary for the fill layer.

myseed=101
set.seed(myseed)
p <- ggplot(dr, aes(x=X,y=Y,colour=Y, fill = factor(highlight))) +
  theme_bw() +
  geom_jitter(width=0.5, shape = 21, size = 3) +
  scale_colour_gradient("Y", low="#5edcff", high="#035280") +
  scale_fill_manual(values=c("red"), guide = FALSE) +
  stat_summary(aes(x=X,y=Y,colour=Y), inherit.aes = FALSE,
               fun.y = "mean", fun.ymin = "mean", fun.ymax= "mean", size=0.3,width=0.33, geom = "crossbar")
plot(p)

enter image description here

I still think a cleaner solution would be to manually code the colors, but I did not attempt it. Maybe someone will supply that solution.

Divi
  • 1,614
  • 13
  • 23
  • 1
    I keep forgetting that point about shapes 21-25. I wish ggplot2 would have a simple warning that an aesthetic couldn't be mapped due to the shape not having enough distinctive features (ping @hadley). – strangeloop Jul 19 '16 at 19:00