1

I've looked around the net and found lots of stuff about jittering and changing the shape of outliers but can't seem to find anything about this specific problem.

I want a black and white boxplot with jittered data points - I can do that.

I would also like to change the shape of outliers. Although there are multiple cases with a score of 4 only one of them changes to a hollow circle.

I would assume that if one data point at a particular level is considered an outlier the rest would be considered outliers too.

Is this a coding error or did I miss something along the way in a stats class? If it's a coding thing how do I get all of them to be hollow?

Apparently my "reputation" needs to be 10 to get attach an image! I hope it makes sense without it though

Here's my code:

plot <- ggplot(phase2.3, aes(Group, Score))

plot + geom_point (position = position_jitter(w = 0.1, h = 0.2)) + 
 geom_boxplot (outlier.shape = 1) + xlab("Group") + theme_bw(20)
Julius Vainora
  • 47,421
  • 9
  • 90
  • 102
A.S.
  • 93
  • 1
  • 4
  • You don't need to attach an image if you give us the data for the plot! Post the results of `dput(phase2.3)` so we can copy/paste it into R. (If you can't share the data or if it's large, make a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) using a built-in data set or simulated data.) – Gregor Thomas Oct 03 '13 at 19:29
  • The scatterplot (`geom_point`) points are jittered, but the outliers of the boxplot are not (and I don't think there is a direct way to jitter them). Look at a plot with each geom separately and it might make more sense what is happening. – Brian Diggs Oct 03 '13 at 23:08
  • Thanks for the help but shadow provided a solution. – A.S. Oct 04 '13 at 17:43

1 Answers1

0

You probably have to calculate, which points are outside the range by yourself. Here is an extension of the standard example from geom_boxplot that shows how to find the outliers using plyr.

# load packages
require(plyr)
require(ggplot2)
# find outliers
df <- ddply(mtcars, "cyl", function(x){
  iqr <- quantile(x[,"mpg"], c(.25, .75)) # inter-quartile-range
  whisker <- iqr+c(-1.5, 1.5)*diff(iqr)   # whiskers-range
  x[,"shape"] <- ifelse(x[,"mpg"] < whisker[1] | x[,"mpg"]>whisker[2], 1, 16)
  return(x)
})
# plot
p <- ggplot(df, aes(factor(cyl), mpg))
p + geom_boxplot() # without jittering
# adding shape manualy
p + geom_boxplot(outlier.size=-Inf) + 
  geom_jitter(aes(shape=factor(shape))) + 
  scale_shape_manual(guide=FALSE, values=c("16"=16, "1"=1)) 
shadow
  • 21,823
  • 4
  • 63
  • 77
  • Excellent solution thank you! Could I ask one tweak however? The individual data points appear in front of the boxplots rather than behind them. Previously the order in which I used geom_point and geom_boxplot ensured that only the points beyond the boxes could be seen. Is there any way of tweaking this? – A.S. Oct 04 '13 at 17:40
  • The solution is to reorder the code (Gracias mi Amigo!): p + geom_jitter(aes(shape=factor(shape))) + scale_shape_manual(guide=FALSE, values=c("16"=16, "1"=1)) + geom_boxplot(outlier.size=-Inf)+ geom_boxplot(outlier.size=-Inf) – A.S. Oct 06 '13 at 07:53