Conditional use of jitter in ggplot2 with geom_point

Question

I have a graph with 12 variables divided into two groups. I can't use facets, but using colour and shape, I have been able to make the visualization easy to understand. However, there are some points that overlap (partially or wholly). I am using jitter to deal with these, but as you can see from the attached graph, this leads to all points being moved around, not just those with overlap. enter image description here

Is there a way to use jitter or dodge conditionally? Even better, is there a way to put the partially overlapping points side-by-side? As you can see, my x-axis is discrete categories, and a slight shift to left/right won't matter. I tried using dotplot with binaxis='y', but that completely spoils the x-axis.

Edit: This graph has managed to do exactly what I am searching for.

Further edit: Adding the code behind this visualization.

disciplines <- c("Comp. Sc.\n(17.2%)", "Physics\n(19.6%)", "Maths\n(29.4%)", "Pol.Sc.\n(40.4%)", "Psychology\n(69.8%)")

# To stop ggplot from imposing alphabetical ordering on x-axis
disciplines <- factor(disciplines, levels=disciplines, ordered=T)

# involved aspects
intensive   <- c( 0.660,  0.438,  0.515,  0.028,  0.443)
comparative <- c( 0.361,  0.928,  0.270,  0.285,  0.311)
wh_adverbs  <- c( 0.431,  0.454,  0.069,  0.330,  0.577)
past_tense    <- c(0.334, 0.229, 0.668, 0.566, 0.838)
present_tense <- c(0.680, 0.408, 0.432, 0.009, 0.996)
conjunctions <- c( 0.928,  0.207,  0.162, -0.299, -0.045)
personal      <- c(0.498, 0.521, 0.332, 0.01, 0.01)
interrogative <- c(0.266, 0.202, 0.236, 0.02, 0.02)
sbj_objective <- c(0.913, 0.755, 0.863, 0.803, 0.913)
possessive    <- c(0.896, 0.802, 0.960, 0.611, 0.994)
thrd_person <- c(-0.244, -0.265, -0.310, -0.008, -0.384)
nouns       <- c(-0.602, -0.519, -0.388, -0.244, -0.196)

df1 <- data.frame(disciplines,
                 "Intensive Adverbs"=intensive,
                 "Comparative Adverbs"=comparative,
                 "Wh-adverbs (WRB)"=wh_adverbs,
                 "Verb: Past Tense"=past_tense,
                 "Verb: Present Tense"=present_tense,
                 "Conjunctions"=conjunctions,
                 "Personal Pronouns"=personal,
                 "Interrogative Pronouns"=interrogative,
                 "Subjective/Objective Pronouns"=sbj_objective,
                 "Possessive Pronouns"=possessive,
                 "3rd-person verbs"=thrd_person,
                 "Nouns"=nouns,
                 check.names=F)

df1.m <- melt(df1)
grp <- ifelse(df1.m$variable %in% c('3rd-person verbs','Nouns'), 'Informational Features', 'Involved Features')
g <- ggplot(df1.m, aes(group=grp, disciplines, value, shape=grp, colour=variable))
g <- g + geom_hline(yintercept=0, size=9, color="white")
g <- g + geom_smooth(method=loess, span=0.75, level=0.95, alpha=I(0.16), linetype="dashed")
g <- g + geom_point(size=4,  alpha=I(0.7), position=position_jitter(width=0.1, height=0))
g <- g + scale_shape_manual(values=c(17,19))

you should provide a reproducible example ( data + code) to let others play with it... — agstudy, Oct 17 '13 at 22:37
Thanks for the code. P.S. your plot will not look as clean as the biomed example because your Y values are all over the place, but you can still line up the x values in order with the below. — beroe, Oct 17 '13 at 23:08

beroe · Accepted Answer · 2013-10-18T14:45:07.497

I am curious what others might suggest, but to get the side-by-side effect, you could code the major x-axis categories as numbers (10, 20,..50) plus/minus a small amount like (0..10)/2 based on the categories you are using for color. So you could get the x-axis as 9.6, 9.8, 10.0, 10.2 ... and then 20.0, 20.2, 20.4. This could create an organized plot instead of assigning those fractional adjustments randomly.

Here is a quick implementation of that idea for your data-set. It offsets the main x variable disciplines by one sixth of the sub-category variable and uses that without jitter for the x value...

M = df1.m
ScaleFactor = 6
xadj = as.numeric(M$variable)/ScaleFactor
xadj = xadj - mean(xadj)   # shift it to center around zero
x10  = as.numeric(M$disciplines) * 10
M$x = x10 + xadj
g = ggplot(M, aes(group=grp, x, value, shape=grp, colour=variable)) 
g + geom_point(size=4,alpha=I(0.7)) + scale_x_discrete(breaks=x10,labels=disciplines)

Note that the values within each category occur evenly spaced across and in the same order. (This code doesn't include all the curve fitting, etc that is shown in the figure).

enter image description here

Variation: You can see the effect even more clearly if you "quantize" your y values, so more of them plot side by side.

M$valmod = M$value - M$value %% 0.2 + .1

Then use valmod in place of value in the aes() statement to see the effect.

To get the category labels back, manually set with scale_x_discrete. This version uses a different ScaleFactor for broader spacing and the quantized y axis:

M=df1.m
ScaleFactor = 3
# Note this could just be xadj instead of adding to data frame
M$xadj = as.numeric(M$variable)/ScaleFactor
M$xadj = M$xadj - mean(M$xadj)   # shift it to center around zero
M$x10  = as.numeric(M$disciplines) * 10
M$x = M$x10 + M$xadj

Qfact = 0.2  # resolution to quantize y values
M$valmod = M$value - M$value %% Qfact + Qfact/2  # clump y to given resolution

g = ggplot(M, aes(group=grp, x, valmod, shape=grp, colour=variable)) +
    scale_x_discrete(limits = M$x10, breaks=unique(M$x10),labels=levels(M$disciplines))
g + geom_point(size=3,alpha=I(0.7))

quantized

I am using `height=0` in my jitter. The vertical positions of all the points are directly from the data. I just don't like how the horizontal position is shifted even when there are no other data points near it. — Chthonic Project, Oct 17 '13 at 22:49
This looks great! But ... is there a way to get back the original category names on the x-axis instead of the numerical values? — Chthonic Project, Oct 18 '13 at 13:56
Yep, that was "an exercise for the reader" but I added it in... I would like to see this "uniform jitter" added to the base ggplot2. Maybe some of the R gurus would have another approach too. — beroe, Oct 18 '13 at 14:45

Conditional use of jitter in ggplot2 with geom_point

1 Answers1

Linked