0

I'd like to use ggplot to generate a series of boxplots derived from all data within a dataset, but then with jittered points showing a random sampling of the respective data (e.g., 100 data points) to avoid over-plotting (there are thousands of data points). Can anyone please help me with the code for this? The basic framework I have now is below, but I don't know what if any arguments can be added to draw a random sampling of data to display as the jittered points. Thanks for any help.

ggplot(datafile, aes(x=factor(var1), y=var2, fill=var3)) + geom_jitter(size=0.1, position=position_jitter(width=0.3, height=0.2)) + geom_boxplot(alpha=0.5) + facet_grid(.~var3) + theme_bw() + scale_fil_manual(values=c("red", "green", "blue")
Jeff
  • 1
  • please provide a minimal reproducible exmaple, [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – yang Apr 21 '20 at 03:13

1 Answers1

2

You could take a random subset of your data using dplyr:

library(dplyr)
library(ggplot)
ggplot(data = datafile, aes(x = factor(var1), y = var2, fill = var3)) + 
  geom_jitter(
    # use random subset of data
    data = datafile %>% group_by(var1) %>% sample_n(100),
    aes(x = factor(var1), y = var2, fill = var3)),
    size = 0.1, 
    position = position_jitter(width = 0.3, height = 0.2)) + 
  geom_boxplot(alpha = 0.5) + 
  facet_grid(.~var3) +
  theme_bw() + 
  scale_fill_manual(values = c("red", "green", "blue")
Obim
  • 136
  • 5