0

I am using boxplot to show the distribution among 5 different data sets.

I know it is possible to arrange them based on their median values.

What I am looking for is to arrange them based on the difference between the first quartile and the third.

Obviously I do not want to arrange them manually by reordering the levels.

I have fixed this using tidyverse group_by and summarise and calculating the difference between the desired quartiles and using that to arrange the boxes.

If anyone need the code or has a better solution, please let me know.

Thank you.

m3hdad
  • 90
  • 1
  • 8
  • 2
    Question is interesting, however it is unclear, please read and edit your question according to: [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Add example data, output that you get and expected output. Also, what's *box sizes*? – pogibas Jan 16 '19 at 10:19
  • Maybe try this link, but change the function to something denoting range: https://rpubs.com/crazyhottommy/reorder-boxplot e.g. https://stat.ethz.ch/R-manual/R-devel/library/stats/html/IQR.html – Jonny Phelps Jan 16 '19 at 10:21
  • I would imagine that this is a duplicate of *many* similar questions you can find here on SO; e.g. [How to change order of boxplots when using ggplot2?](https://stackoverflow.com/questions/6867393/how-to-change-order-of-boxplots-when-using-ggplot2) and additional links therein. Answers provided in these posts demonstrate how to re-order boxes in a boxplot according to various metrics. Unless you can demonstrate with a reproducible code example (see @PoGibas comment) how these answers do *not* solve your issue I vote to close this question as a duplicate. – Maurits Evers Jan 16 '19 at 10:21
  • Possible duplicate of [How to change order of boxplots when using ggplot2?](https://stackoverflow.com/questions/6867393/how-to-change-order-of-boxplots-when-using-ggplot2) – pogibas Jan 16 '19 at 10:54
  • Thanks, I found a way to get what I want. I don't think the question is a duplicate as I could not find what I was looking for. Maybe I did not explain fully. – m3hdad Jan 16 '19 at 11:26
  • @m3hdad feel free to answer your own question, please share the solution. – zx8754 Jan 16 '19 at 11:45

2 Answers2

1

Here is how I ordered my boxplots based on the difference between 1st and 3rd quartiles. "df" is your data.frame, "column1" is the column you want to group by based on, and "column2" contains your values which you are trying to see the distribution on.

DisTable <- df %>%
        group_by(column1) %>%
        summarise(Min=quantile(column2,probs=0.0),
                  Q1=quantile(column2, probs=0.25),
                  Median=quantile(column2, probs=0.5),
                  Q3=quantile(column2, probs=0.75),
                  Max=quantile(column2,probs=1),
                  DiffQ3Q1=Q3-Q1) %>%
        arrange(desc(DiffQ3Q1))

bporder <- as.character(DisTable$column1)

ggplot(df,aes(x=factor(df$column1,levels=bporder),y=column2,fill=column1))+
        geom_boxplot()
m3hdad
  • 90
  • 1
  • 8
1

Do you mean the Interquartile range (IQR())? If so you can do

diamonds %>% 
  as.tibble() %>% 
  ggplot(aes(reorder(cut, price, IQR), price)) + 
   geom_boxplot() 
Roman
  • 17,008
  • 3
  • 36
  • 49