0

I have a dataset with calculated quantiles for each department and country. It looks like this:

df <- structure(list(quantile = c("p5", "p25", "p50", "p75", "p95", 
"p5", "p25", "p50", "p75", "p95", "p5", "p25", "p50", "p75", 
"p95", "p5", "p25", "p50", "p75", "p95"), value = c(6, 12, 20, 
33, 61, 6, 14, 23, 38, 63, 7, 12, 17, 26, 50, 7, 12, 18, 26, 
51), country = c("A", "A", "A", "A", "A", "B", "B", "B", "B", 
"B", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B"), dep = c("D", 
"D", "D", "D", "D", "D", "D", "D", "D", "D", "I", "I", "I", "I", 
"I", "I", "I", "I", "I", "I"), kpi = c("F", "F", "F", "F", "F", 
"F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", 
"F", "F")), row.names = c(NA, -20L), class = c("tbl_df", "tbl", 
"data.frame"))

Now, I would like to build a boxplot for each department comparing countries and using p5/p95 instead of min/max similar to this plot but without outliers (hence, Train_number would be countries): <https://i.stack.imgur.com/cq8Eo.png>

The corresponding code to this plot is (from question ggplot2, geom_boxplot with custom quantiles and outliers):

ggplot(MyData, aes(factor(Stations), Arrival_Lateness, 
                   fill = factor(Train_number))) + 
  stat_summary(fun.data = f, geom="boxplot", 
               position=position_dodge(1))+
  stat_summary(aes(color=factor(Train_number)),fun.y = q, geom="point", 
               position=position_dodge(1))

I tried to derive a solution from the code above and the provided answers. Unfortunately I lack the knowledge how to provide the neccessary values from the variables quantile and value to ggplot(). Is there an argument in the stat_summary() function I missed and could use? Or just another simple solution?

klamsi
  • 27
  • 6
  • You need not calculate the quantiles for `boxplot` rather your dataset should contain the original values. The function will calculate the quantiles. Could you please provide your original dataset? – UseR10085 Jun 10 '20 at 08:45
  • Unfortunately this is the original dataset I got due to GDPR. – klamsi Jun 10 '20 at 08:49

1 Answers1

1

Whatever data you have provided from that you can generate the following plot

library(ggplot2)

f <- function(x) {
  r <- quantile(x, probs = c(0.05, 0.25, 0.5, 0.75, 0.95))
  names(r) <- c("ymin", "lower", "middle", "upper", "ymax")
  r
}

ggplot(df, aes(factor(dep), value)) + 
  stat_summary(fun.data = f, geom="boxplot", 
               position=position_dodge(1))+
    facet_grid(.~country, scales="free")

enter image description here

I don't know whether it is correct or not.

Community
  • 1
  • 1
UseR10085
  • 7,120
  • 3
  • 24
  • 54