0

I'm working on a very large dataset containing around 1.6M data points. I'm using the violin plot along with the boxplot to represent the data from each category (there are multiple categories and each has its own set of values).

But the problem which I'm facing is, there are a lot of data points (outliers) above the error bar because of that the focus of the plot has been lost.

Earlier I thought that probably if I remove all the data points after a specific value it will help me to represent what I wanted to show. But It didn't work because for each category the errorbar range is different and because of that, I lost the majority of data from other categories.

So, now I'm thinking to remove or not showing the data points above the error bar for each category individually, for both box and violin plot. And I introduced outlier.shape=NA in the geom_boxplot, it worked fine for the boxplot. Similarly, I wanted to remove all those data points from the violin plot as well which are above the error bar in the boxplot.

Here are the plots before and after using outlier.shape=NA. Before: enter image description here
After: enter image description here

Here is my code :

med_violin <- data %>%
  left_join(sample_size) %>%
  mutate(myaxis = fct_reorder(paste0(Country), Diff, .fun='median')) %>%
  ggplot( aes(x=myaxis, y=Diff, fill=Country)) +
  geom_violin(width=1.5, color = "black",  position = position_dodge(width=1.8), trim = TRUE) +
  geom_boxplot(width=0.2, color="white", alpha=0.01, outlier.colour="red", outlier.size=0.1, outlier.shape = NA) +
  scale_y_continuous(breaks = c(0,25,50,75,100,125,150,525,550))+
  coord_trans(y = squash_axis(150, 525, 15)) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))+
  theme(axis.text.x = element_text(size = 8))+
  theme(legend.position ="none")+
  scale_fill_viridis(discrete = TRUE) +
  xlab("")

med_violin

How can I implement the same thing in genom_violin, so that it will also not show the data points above the error bar?

I even tried this : Ignore outliers in ggplot2 geom_violin

But did not work for me.

Thank you.

vibhu sharma
  • 475
  • 1
  • 3
  • 10
  • Please make your question [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) by providing the data in `dput()` format. – UseR10085 May 30 '21 at 11:53
  • You might find your answer here: https://stackoverflow.com/questions/49908469/ignore-outliers-in-ggplot2-geom-violin – ViviG May 30 '21 at 22:47

0 Answers0