0

WHen you run this code the red lines should match up to the median lines in the box plots. It does for one gorup but not the other. Any idea why?

Also There is a warning message produced

"Warning message:
Removed 7 rows containing non-finite values (stat_boxplot). "

What is that about?

set.seed(123)
d = data.frame(group=c(rep("A",10),rep("B",10)),v = rnorm(20))
summary_stats = d %>% dplyr::filter( is.na(v) ==FALSE) %>% dplyr::group_by(group) %>% 
  dplyr::summarise(
    Q1 = quantile(v,.25,na.rm =TRUE), 
    MEDIAN = quantile(v,.5,na.rm =TRUE), 
    Q3 = quantile(v,.75,na.rm =TRUE)
) %>% dplyr::mutate(IQR = Q3-Q1) %>% dplyr::arrange(MEDIAN)

boxplot.stats(d[d$group=="A",]$v  )
boxplot.stats(d[d$group=="B",]$v  )

d$group = factor( d$group ,levels=summary_stats$group, ordered = TRUE)

ggplot(d, aes(x=group, y=v)) + 
   geom_boxplot(outlier.shape = NA,outlier.size =0,coef = 0)+
  theme(axis.text.x=element_text(angle=90))+
    geom_hline(yintercept = -0.07983455,color= "red") +
      geom_hline(yintercept =  0.3802926 ,color= "red") +
   scale_y_continuous(limits = c( min(summary_stats$Q1)-.1,  max(summary_stats$Q3)+.1  ))
user3022875
  • 8,598
  • 26
  • 103
  • 167

1 Answers1

1

If you leave off the scale_y_continuous part, it all seems to work fine. It appears that that line messes with the calculation of the whiskers in some way. A safer way is to use coord_cartesian. For example

ggplot(d, aes(x=group, y=v)) + 
  geom_boxplot(outlier.shape = NA,outlier.size =0,coef = 0)+
  theme(axis.text.x=element_text(angle=90))+
  geom_hline(yintercept = -0.07983455,color= "red") +
  geom_hline(yintercept =  0.3802926 ,color= "red") +
  coord_cartesian(ylim  = c( min(summary_stats$Q1)-.1,  max(summary_stats$Q3)+.1  ))

enter image description here

MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • 1
    `scale_y_continuous` excludes data that's outside the range of the `limits` (and the excluded data is therefore not included in the boxplot calculations), whlie `coord_cartesian` does not. The warning tells you how many observations were excluded. See [this SO question](https://stackoverflow.com/questions/32505298/explain-ggplot2-warning-removed-k-rows-containing-missing-values/32506068#32506068) for example. – eipi10 Dec 04 '18 at 19:59