1

I want to plot boxplots for several categories and I found very handy if there is a ribbon showing the mean +/- standard deviation for the whole dataset.

I can easily get those values and put them in the plot using the geom_hline() function, however, I want the area between mean(x) + sd(x) and mean(x) - sd(x) to be coloured for better visibility. I was thinking adding geom_ribbon()there, however I crashed into a problem that my values a cathegorical and I want those lines to fully cover the background on x axis.

library(tidyverse)
library(reshape2)

df <- data.frame(
  a = rnorm(1000, 20, 40),
  b = rnorm(1000, 100, 40)
)
df
df <- melt(df)

df %>%
  ggplot(aes(variable, value)) +
  geom_hline(yintercept = c(
    mean(df$value) - sd(df$value),
    mean(df$value) + sd(df$value)
  )) +
  geom_hline(yintercept = mean(df$value), size = 3, col = "red") +
  geom_boxplot()

enter image description here

zx8754
  • 52,746
  • 12
  • 114
  • 209
Kryštof Chytrý
  • 348
  • 1
  • 3
  • 15
  • Indeed. I only did not find the right keywords. – Kryštof Chytrý Aug 21 '19 at 11:31
  • Or actually no, the difference is that I am working with discrete values on x axis and the question on given link works with continous. – Kryštof Chytrý Aug 21 '19 at 11:41
  • 1
    No worries, closed as duplicate. Duplicates are useful for future users with different keywords. – zx8754 Aug 21 '19 at 11:42
  • The idea is to use geom_rect, then it is a dupe. Feel free to vote re-open, of course. – zx8754 Aug 21 '19 at 11:43
  • Ok. But is `geom_rect()` applicable this way if the x axis is discrete? – Kryštof Chytrý Aug 21 '19 at 12:07
  • 1
    If you have a question where solutions depend on the data type, it's important to replicate that in your example. You also might be mixing up your x and y axes in describing them: you're saying but what if x is discrete, but the x you've shown here *is* discrete – camille Aug 21 '19 at 15:10

1 Answers1

1

Try using geom_rect:

ggplot(df, aes(variable, value)) +
  geom_rect(aes(xmin = -Inf , 
                xmax = Inf ,
                ymin = mean(df$value) - sd(df$value) ,
                ymax = mean(df$value) + sd(df$value) ,
                fill = "blue")) +
  geom_hline(yintercept = mean(df$value), size = 3, col = "red") +
  geom_boxplot() +
  scale_fill_identity()

enter image description here

zx8754
  • 52,746
  • 12
  • 114
  • 209
  • Thank you, but somehow this script does not work to me. The given error says that "Discrete value supplied to continuous scale", which is true. Or is -Inf and Inf recognized also in discrete scales? – Kryštof Chytrý Aug 21 '19 at 11:45