1

I would like to plot the spread of my data on top of a boxplot. I've managed so far to overlap a geom_boxplot() with a geom_dotplot(). However, I have many datapoints, with many overlapping. I would like to give some indication in my plot of where most values are. I thought geom_count would be able to help out. I have however not found a way to use it with multiple groups!

Here's my first attempt at the boxplot/dotplot:

ggplot(data_BDI, aes(x=time, y=BDI, fill=Groups)) +
geom_boxplot(position=position_dodge(0.8))+
geom_dotplot(binaxis='y', stackdir='center', 
             position=position_dodge(0.8),
             dotsize=0.7) + 
scale_fill_grey() + theme_classic()

enter image description here

Now I want to add the geom_count, which only gives a result per time, and not per group:

  ggplot(data_BDI, aes(x=time, y=BDI, fill=Groups)) +
  geom_boxplot(position=position_dodge(0.8))+
  geom_dotplot(binaxis='y', stackdir='center', 
               position=position_dodge(0.8),
               dotsize=0.7) + 
  scale_fill_grey() + theme_classic() +
  geom_count(aes(x=time, y=BDI, group=Groups))

enter image description here

Anyway to make the different colored group-points different in size? Or any other method in displaying the overlap?

Inkling
  • 469
  • 1
  • 4
  • 19
  • 4
    I'd recommend checking out ggbeeswarm as an alternative: https://github.com/eclarke/ggbeeswarm There's also beanplots (https://cran.r-project.org/web/packages/beanplot/vignettes/beanplot.pdf) and pirate plots (http://nathanieldphillips.com/2016/04/pirateplot-2-0-the-rdi-plotting-choice-of-r-pirates/) that might be helpful. – Daniel Anderson Feb 26 '18 at 21:24
  • Thank you! Never heard of any of them. I'm especially intrigued by the pirateplot! – Inkling Feb 27 '18 at 09:18

2 Answers2

1

This may not be the most elegant solution, but you can use the empty circle shape so that you can overlap dots, and reduce the size of the dots if needed. Use the arguments shape and size then reduce the position dodge.

You can also increase the space between boxplots: Spacing between boxplots in ggplot2

EDIT: here is a reproducible exemple using geom_point() with position_jitterdodge() instead of geom_dotplot().

#just creating a reproducible example:
rdu<-function(n,k) sample(1:k,n,replace=T)
time<-rdu(300,30)
data_BDI<-data.frame(BDI=time,time=rep(c("BDI","BDI.FU","BDI.FU2"),each=100),Groups=rep(rep(c("ABM","both","control","BMS"),each=25),3))

Solution 1 using geom_point():

  ggplot(data_BDI, aes(x=time, y=BDI, fill=Groups)) +
  geom_boxplot(position=position_dodge(0.8))+
  geom_point(aes(fill = Groups), size = 2, shape = 1, position = position_jitterdodge())+
    # geom_dotplot(binaxis='y',
    #            stackdir='center', 
    #            position=position_dodge(0.8),
    #            dotsize=0.7,
    #            shape=1) + 
  scale_fill_grey()+ 
  theme_classic()

enter image description here

Solution 2 using geom_count():

ggplot(data_BDI, aes(x=time, y=BDI, fill=Groups)) +
  geom_boxplot(position=position_dodge(0.8))+
  scale_fill_grey()+ 
  theme_classic()+
  geom_count(aes(fill = Groups), position = position_jitterdodge())

enter image description here

Solution 3: If you want the dots aligned just use these parameters:

ggplot(data_BDI, aes(x=time, y=BDI, fill=Groups)) +
  geom_boxplot(position=position_dodge(0.8))+
  scale_fill_grey()+ 
  theme_classic()+
  geom_count(aes(fill = Groups), position = position_jitterdodge(0,0,0.81))

enter image description here

Nakx
  • 1,460
  • 1
  • 23
  • 32
1

You can try a ggbeeswarm solution as well. In contrast you have to calculate the n by yourself using the tidyverse

library(tidyverse)
library(ggbeeswarm)
data_BDI %>%
  group_by(time, Groups,BDI) %>%  # grouping to calculate the counts of duplicates
  add_count() %>% # the calculation
ggplot(aes(x=Groups, y=BDI, fill=Groups)) +
  geom_boxplot() +
  # remove duplicates to keep the plot clean
  geom_beeswarm(data=. %>% distinct(), aes(size=n)) +
  facet_grid(~time) +
  guides(fill = "none")

enter image description here

Data

set.seed(1233)
rdu<-function(n,k) sample(1:k,n,replace=T)
time<-rdu(300,30)
data_BDI<-data.frame(BDI=time,time=rep(c("BDI","BDI.FU","BDI.FU2"),each=100),Groups=rep(rep(c("ABM","both","control","BMS"),each=25),3))
Roman
  • 17,008
  • 3
  • 36
  • 49