3

I am now dealing with some data and I want to make a boxplot showing minimum, 2.5, 25, 50, 70, 75, 97.5, and maximum. The boxplot should also have a legend showing lines with different colors to represent each quantile. Is there any way to do this? Thanks for any help.

set.seed(123)
Mydata = sample(x=100:300, size = 500, replace = T)
Mydata = c(Mydata, 1, 500)
boxplot(Mydata)

PS. I have tried the code provided by @thelatemail, but get a totally different figure in RStudio. Any solution to this? Thanks. enter image description here

Yang Yang
  • 858
  • 3
  • 26
  • 49
  • 1
    Which quantiles should be shown in the "box" and which in the "whiskers"? Conventional boxplots only show 25 to 75 in the box and then there are slightly complicated rules for how the data outside of that is shown. – Marius Jun 07 '19 at 03:35
  • @Marius Hi, I am thinking to put 2.5, 25, 50, 70, 75, 97.5 in the "box" while minimum and maximum as "whiskers". Could you help me with this? Thanks. – Yang Yang Jun 07 '19 at 03:43

4 Answers4

3

What you want to do cannot be generated easily using the boxplot framework.

Underlying boxplots in R is the boxplot.stats() function. Let's run it on your data:

boxplot.stats(Mydata)

$stats
[1]   1 152 204 253 300

$n
[1] 502

$conf
[1] 196.8776 211.1224

$out
[1] 500

You can see that $stats returns in order: lower whisker, 25% quantile, median, 75% quantile, upper whisker. Compare with quantile:

quantile(Mydata)

  0%  25%  50%  75% 100% 
   1  152  204  253  500

If you use geom_boxplot() from ggplot2, it's possible to redefine the values used for the box. But you can only draw the same five values: they are called ymin, lower, middle, upper and ymax.

So for example if you wanted the 2.5% quantile as lower and the 97.5% quantile as upper, you could try:

data.frame(x = 1,
           y0 = min(Mydata),
           y025 = quantile(Mydata, 0.025),
           y50 = median(Mydata),
           y975 = quantile(Mydata, 0.975),
           y100 = max(Mydata)) %>%
  ggplot(df, aes(x)) +
  geom_boxplot(aes(ymin = y0, 
                   lower = y025, 
                   middle = y50, 
                   upper = y975, 
                   ymax = y100),
               stat = "identity")

enter image description here

However, you would want to make it clear (using labels perhaps) that this is not a "standard" boxplot.

Another ggplot2 idea is to use geom_jitter to plot the data points, then add lines for the desired quantiles using geom_hline. Something like this:

library(tibble)
library(ggplot2)

Mydataq <- quantile(Mydata, probs = c(0.025, 0.25, 0.5, 0.7, 0.75, 0.975)) %>%
  as.data.frame() %>% 
  setNames("value") %>% 
  rownames_to_column(var = "quantile")

Mydataq %>% 
  ggplot() + 
  geom_hline(aes(yintercept = value, color = quantile)) + 
  geom_jitter(data = tibble(x = "Mydata", y = Mydata), 
              aes(x = x, y = y))

enter image description here

neilfws
  • 32,751
  • 5
  • 50
  • 63
  • Thanks a lot for your help. Is it possible to plot 2.5, 25, 50, 70, 75, 97.5 in the "box" while minimum and maximum as "whiskers", just like a boxplot with 6 straight lines? – Yang Yang Jun 07 '19 at 06:12
  • Not using `geom_boxplot`. You might be able to build up the box yourself using some other graphics function. – neilfws Jun 07 '19 at 09:08
3

Just keep overplotting using bxp:

set.seed(123)
Mydata = sample(x=100:300, size = 500, replace = T)
Mydata = c(Mydata, 1, 500)

bp <- boxplot(Mydata, range=0, plot=FALSE)

vals <- c(
  min=min(Mydata),
  quantile(Mydata, c(0.025, 0.25, 0.5, 0.7, 0.75, 0.975)),
  max=max(Mydata)
)

bxp(bp, whisklty=0, staplelty=0)
bp$stats[2:4,] <- c(vals[2], Inf, vals[5])
bxp(bp, whisklty=0, staplelty=0, add=TRUE)
bp$stats[2:4,] <- c(vals[2], Inf, vals[7])
bxp(bp, whisklty=1, staplelty=1, add=TRUE)

enter image description here

thelatemail
  • 91,185
  • 12
  • 128
  • 188
  • Hi, Thank you for your help. I have tried your code, but it produces a totally different figure with yours. Could you please check? – Yang Yang Jun 07 '19 at 22:08
  • @YangYang - I just ran it and got the same result as shown. Did you close your existing boxplot window? This code needs to start from fresh. – thelatemail Jun 07 '19 at 22:13
  • Hi, I just notice that your code works in RGui, but not work in RStudio. Very strange... Could you please try to make the code also work in RStudio? Thanks. – Yang Yang Jun 07 '19 at 22:23
  • @YangYang - run `dev.new()` first to open a graphics window then run the code above. RStudio clearly doesn't respect the `add=TRUE` part of the code and generates a totally new plot each time. That seems to be RStudio's bug. – thelatemail Jun 07 '19 at 22:27
  • Hi, I have added `dev.new()` before your code, but it still does not work in RStudio. Maybe we have to go with RGui? – Yang Yang Jun 07 '19 at 22:31
  • @YangYang - try running it twice in a row and then it will work. RStudio's embedded plot window doesn't play nicely it appears. I will file a bug report. – thelatemail Jun 07 '19 at 22:35
  • Thank you so much for your advice. A small question about the code, in `bp$stats[2:4,] <- c(vals[2], Inf, vals[5])`, why do you use `Inf`? What if we use `bp$stats[2:4,] <- c(vals[2], vals[5], vals[7]` so we can shorten the code? – Yang Yang Jun 08 '19 at 00:49
  • @YangYang - certainly, the code can probably be simplified a little - it was a bit of a rush job and it shows. The `min` and `max` aren't needed either. The `Inf` was just used to suppress the median being plotted again. I'm basically just picking new values to graph each time from the `vals` subset. – thelatemail Jun 08 '19 at 00:55
  • Got it! Thanks a lot for your help. – Yang Yang Jun 08 '19 at 00:58
2

Here's an idea. You might have to refine it further.

#Data
P = c(2.5, 25, 50, 70, 75, 97.5)

#Quantiles
b = quantile(x = Mydata, probs = P/100)

#Custom funtion
dp = function(at, y1, y2, width, ...){
    polygon(x = c(at - width/2, at + width/2, at + width/2, at - width/2),
            y = c(y1, y1, y2, y2), ...)
}

#Parameters
at = 1
width = 0.2

graphics.off()

#Whiskers
plot(x = rep(at, length(Mydata)), y = Mydata, type = "l")
segments(x0 = at - width/2, x1 = at + width/2, y0 = min(Mydata), y1 = min(Mydata))
segments(x0 = at - width/2, x1 = at + width/2, y0 = max(Mydata), y1 = max(Mydata))

#Boxes
sapply(1:ceiling(length(b)/2), function(i) {
    dp(at = at, y1 = b[i], y2 = b[length(b) + 1 - i], width = width * i, col = i)
})
#OR
sapply(1:ceiling(length(b)/2), function(i) {
    segments(x0 = at, x1 = at, y0 = b[i], y1 = b[length(b) + 1 - i],
             lwd = 10 * i, col = i, lend = "butt")
})

enter image description here

d.b
  • 32,245
  • 6
  • 36
  • 77
0

A base R solution: If you only want to change part of the boxplot, here 25%- and 75% quantiles to 0.125, 0.9 quantiles:

set.seed(12345)
x <- rnorm(1000)
bp <- boxplot(x, whisklty=0, staplelty=0, range=1.5, plot=FALSE)
bp$stats[c(2, 4), ] <- quantile(x = x, probs = c(0.05, 0.95))
bxp(bp, whisklty=1, staplelty=1, boxfill = "red")
# To add more from inner to outer, e.g.
bp$stats[c(2, 4), ] <- quantile(x = x, probs = c(0.125, 0.9))
bxp(bp, whisklty=1, staplelty=1, boxfill = "lightgray", add=TRUE)

Looks the same as the original, just the box changed.

Christoph
  • 6,841
  • 4
  • 37
  • 89