boxplot: order groups by the mean of a subset of each group

Question

Let's consider this data:

df = data.frame('score'=round(runif(15, 1, 10)),
                'group'=paste0("a",rep(c(1,2,3),each=5)),
                'category'=rep(c("big", "big", "big", "big", "small"), 3))

I would like to plot boxplots of this data with ggplot2. What i want is: boxplot(score~group), but with the boxplots arranged according to the mean of the "big" individuals of each group.

I can't figure it out in a simple way, without creating new variables. OK to use Dplyr. Thanks.

score 2 · Accepted Answer · answered Mar 23 '15 at 23:02

2

I don't know if this qualifies as a simple way, I personally find it simple, but I use dplyr to find the means:

#find the means for each group
library(dplyr)
means <-
df %>%
  #filter out small since you only need category equal to 'big'
  filter(category=='big') %>%
  #use the same groups as in the ggplot
  group_by(group) %>%
  #calculate the means
  summarise(mean = mean(score))

#order the groups according to the order of the means
myorder <- means$group[order(means$mean)]

In this case the order is:

> myorder
[1] a1 a2 a3

In order to arrange the order of the boxplots according to the above you just need to do:

library(ggplot2)
ggplot(df, aes(group, score)) +
  geom_boxplot() +
  #you just need to use scale_x_discrete with the limits argument
  #to pass in details of the order of appearance for the boxplots
  #in this case the order is the myorders vector
  scale_x_discrete(limits=myorder)

And that's it.

enter image description here

answered Mar 23 '15 at 23:02

LyzandeR

37,047
12
77
87

Unless I'm mistaken, order seems incorrect... I get "a1", "a3", "a2" using `myorder <- names(sort(with(df[df$category=="big",], by(data = score, group, mean))))` – Dominic Comtois Mar 23 '15 at 23:23
@DominicComtois Thanks for the comment. The creation of the data.frame uses `runif` which creates random numbers. Each rerunning yields different results and that is why you got a different order. – LyzandeR Mar 23 '15 at 23:27
Oh,right, sorry about that I hadn't realized it was randomized without seed! – Dominic Comtois Mar 23 '15 at 23:30
1

@DominicComtois No worries :). It is always good to post comments for mistakes even if sometimes they prove not to be the case :) – LyzandeR Mar 23 '15 at 23:32
@LyzandeR Thanks a lot, it's the last part with the `scale_x_discrete` that i needed most. it really makes my day to realize that `xlim()` can be used for categorical variable as well ! It is indeed simple, you just decomposed all the steps but we could perfectly fit the whole code inside the scale_x argument! – agenis Mar 24 '15 at 09:07
Great! Really glad I could be of help :) – LyzandeR Mar 24 '15 at 11:38

boxplot: order groups by the mean of a subset of each group

1 Answers1

Linked