0

I'm trying to reorder my boxplots as it's plotting them in alphabetical order. I'd like to specify the ordering.
I have a dataframe called stride with 10 columns. These include the subject_id, the age_group (young, middle or old) and stride_int.

I used the following code to create the boxplot:

stride %>% 
ggplot(aes(x=age_group,y=stride_int)) + 
geom_boxplot(outlier.colour = "red", outlier.shape = 21, outlier.fill = "red", outlier.size = 2) + 
theme_light() +
labs(title = "Stride interval for different age groups",
   y = "stride interval", 
   x = "age group") 

This plots the boxplots in the order the age_groups which is 'middle', 'old' and 'young' so alphabetically.

I would like to order them as 'young', 'middle' and 'old'.

I tried the following:

stride %>%
arrange(age_group) %>%
mutate(age_group = factor(age_group, levels=c("young", "middle", "old"))) %>%
ggplot( aes(x=age_group, y=stride_int)) +
geom_boxplot(outlier.colour = "red", outlier.shape = 21, outlier.fill = "red", outlier.size = 2) + 
theme_light() +
labs(title = "Stride interval for different age groups",
   y = "stride interval", 
   x = "age group")

but all it plots just one boxplot. There are no NAs in my dataframe so not sure what's going on. box plot that i tried to order

I've added the dput(head(stride)) and have pasted below. Age_group is already are already characters. I'm not sure what row.names is?

structure(list(time = c(4.0433, 5.1533, 6.1, 9.9633, 11.06, 12.04
), stride_int = c(0.85, 1.11, 0.9467, 1.11, 1.0967, 0.98), subject_id 
= c(1, 1, 1, 1, 1, 1), age_months = c(40, 40, 40, 40, 40, 40), gender 
= c("M", "M", "M", "M", "M", "M"), height_cm = c(102.87, 102.87, 
102.87, 102.87, 102.87, 102.87), weight_kg = c(19.5046720493514, 
19.5046720493514, 19.5046720493514, 19.5046720493514, 
19.5046720493514, 19.5046720493514), leg_length_cm = c(58.42, 58.42, 
58.42, 58.42, 58.42, 58.42), speed_ms = c(1.04289, 1.04289, 1.04289, 
1.04289, 1.04289, 1.04289), age_group = c("Young", "Young", "Young", 
"Young", "Young", "Young")), row.names = c(NA, -6L), class = 
c("tbl_df", "tbl", "data.frame"))

I've also replicated a minmial version of my dataframe below:

time stride_int subject_id gender leg_lenght_cm speed_ms age_group
<dbl>    <dbl>   <dbl>      <chr>    <dbl>       <dbl>     <chr>  
4.04      0.85       1        M       58.4      1.04      Young 
5.15     1.11        1        M       58.4      1.04       Young  
184.60   0.9533     33        F      68.58     1.492      Middle
185.59   0.9900     33        F      68.58     1.492      Middle
186.56    0.970     33        F      68.58     1.492      Middle
64.3600   1.0400    39        F      83.82     1.079       Old
65.3933   1.0333    39        F      83.82     1.079       Old
66.4433   1.0500    39        F      83.82     1.079       Old
477.8933  0.9167     9        F      50.8      1.1377      Young
479.0200  1.1267     9        F      50.8      1.1377      Young
480.3135  1.0883     9        F      50.8      1.1377      Young
NLC
  • 11
  • 2
  • 2
    Search stackoverflow or google for "ggplot2 axis order" and all answers point to using `factor`s. See https://stackoverflow.com/q/3253641/3358272, https://stackoverflow.com/q/12774210/3358272, https://stackoverflow.com/q/18401931/3358272; ordering with groups https://stackoverflow.com/q/44350031/3358272. – r2evans Apr 17 '23 at 23:31
  • It would be helpful to provide a minimal, reproducible example. Without some example data to work with, in the format you used, we can't replicate the problem or propose a full solution. See here: https://stackoverflow.com/help/minimal-reproducible-example (Also, I have found that sometimes just creating a minimal, reproducible example helps me solve the problem I was having in the first place. I'm not saying that's true in your case!) – tgraybam Apr 18 '23 at 02:50
  • It would be useful to examine your data after your `mutate(age_group ...` step. I would expect that if you have just arranged character data, it'd appear in "middle" "old" "young" alphabetical order and wouldn't match your assigned labels. But in your case they're NA, which makes it sound like maybe `age_group` is numeric to start with? If that's the case, `mutate(age_group = factor(as.character(age_group), levels=c("young", "middle", "old")))` might work better. Can you share some example data by running `dput(head(stride))` and copying the output into your question? – Jon Spring Apr 18 '23 at 03:20
  • @JonSpring age_group is already a character. So not sure why it won't work – NLC Apr 18 '23 at 12:25
  • @tgraybam I've pasted in dput and a mimimal data frame in my original post – NLC Apr 18 '23 at 12:26
  • @r2evans thanks for the links. I was looking up things like reordering and boxplot- hadn't thought of trying to rescale. Will see if I can get that to work – NLC Apr 18 '23 at 12:28
  • I don't think rescaling is necessary. Just use `factor(age_group, levels=c("Young", "Middle", "Old"))`. In your attempt to use `factor`, I see you used a lower-case `"young"`, that's not going to work (which is why your plot shows `NA` in the axis), the spelling has to be exactly the same. – r2evans Apr 18 '23 at 12:38
  • 1
    @r2evans Yes you're right! That was the reason why it didn't work. Thank you so much! – NLC Apr 18 '23 at 12:50
  • Please provide enough code so others can better understand or reproduce the problem. – Community Apr 18 '23 at 15:09

0 Answers0