2

I obtained the following picture by using:

boxplot(series,
    col = "orange",
    border = "brown")

enter image description here The code:

boxplot(d$y,
        col = "orange",
        border = "brown")
abline(h = min(d$y), col = "Blue")
abline(h = max(d$y), col = "Yellow")
abline(h = median(d$y), col = "Green")
abline(h = quantile(d$y, c(0.25, 0.75)), col = "Red")

produces instead the picture below enter image description here I wanted to see if the boxplot identifies the five number summary. Blue, green and red lines denote minimum, lower-hinge, median, upper-hinge, as expected but I'm not sure about the position of yellow line. Yellow line should be on the top whisker end... Isn't it?

Mark
  • 1,577
  • 16
  • 43
  • 2
    Can you post a reproducible example of your `series` and `d` dataframes ? see: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – dc37 Mar 11 '20 at 18:45
  • No, you are taking the maximum value of those `y` values with `max(d$y)`. Therefore, it is in the correct spot. – eonurk Mar 11 '20 at 18:46
  • 1
    Try and grab the output with `x <- boxplot(...)`, compare! – jay.sf Mar 11 '20 at 18:46

2 Answers2

3

The fivenum(x) function returns the minimum, lower-hinge (25% quantile), median, upper-hinge (75% quantile), and maximum for a list of values. However boxplot(x)$stats returns the lower-whisker, lower-hinge (25% quantile), median, upper-hinge (75% quantile), and upper-whisker. Whisker values are generally only calculated for box plots and are by default the most extreme data points that are no more that 1.5 times the interquartile range away from the upper and lower quartiles. These may or may not be the same as the mix and max values.

If you plot the yellow line at the max value, you would expect it go be drawn at the highest value. There are data points located outside of your whiskers.

MrFlick
  • 195,160
  • 17
  • 277
  • 295
2

boxplot by default removes outliers. Sometimes those max and min may not be inside the range depicted by the whiskers (see here).

set.seed(42)
x = rnorm(200) * 10
f = fivenum(x)
b = boxplot(x)

abline(h = b$stats)
abline(h = b$out, col = "red")  #Outliers

You can kind of extract the values similar to fivenum from boxplot

identical(f, c(min(b$out, b$stats[1]),
  b$stats[2:4],
  max(b$stats[5], b$out)))
#[1] TRUE
lop
  • 91
  • 3