3

Is there a way to ignore outliers in geom_violin and have the y axis plot be correlated with the Q1 and Q3 quantiles? (range=1.5 in base R). It would be great if this could be automated (i.e. not just calling out a specific y axis limit).

I see a solution using geom_boxplot here: Ignore outliers in ggplot2 boxplot

But is there a way to replicate this type of solution in geom_violin? Thanks in advance!

Example code below with desired outcome

library(ggplot2)
Result <- as.numeric(c(.2, .03, .11,  .05, .2, .02, .22, 1.1, .02, 120))
Group <- as.factor(c("a", "a", "a", "b", "b", "b", "c", "c", "c", "c"))
x <- data.frame(Result, Group)

plot = ggplot(x, aes(x=Group, y=Result)) +
  geom_violin()

print(plot)

Here is the output of the above (not a super helpful graphic):

enter image description here

I'd like something like the plot below using the above data: enter image description here

kslayerr
  • 819
  • 1
  • 11
  • 21

1 Answers1

3

I think a similar method as the one you link to will work here, except you will need to compute those stats for each group and use the minimum Q1 and maximum Q3 as the coord_cartesian:

library(dplyr)
# compute lower and upper whiskers for each group
ylims <- x %>%
  group_by(Group) %>%
  summarise(Q1 = quantile(Result, 1/4), Q3 = quantile(Result, 3/4)) %>%
  ungroup() %>%
  #get lowest Q1 and highest Q3
  summarise(lowQ1 = min(Q1), highQ3 = max(Q3))

plot + coord_cartesian(ylim = as.numeric(ylims)*1.05)

Note that you can change the scaling in the call to coord_cartesian and the quantile breaks in the piped bit of code that calculates the range of Q1's and Q3's.

m.evans
  • 606
  • 3
  • 15