1

I hope someone can help me with the following problem: I would like to display the value (avg) of different laboratory parameters (parameter) of 2 different groups (gruppe). Additionally, I want to plot this information according to the change over time (performance) in 3 different facets. Here a tibble of the dataset:

# A tibble: 402 x 4
# Groups:   gruppe, parameter [134]
   gruppe parameter                      performance     avg
   <chr>  <chr>                          <chr>         <dbl>
 1 DGE    ACPA(citrull. Prot.-Ak) EIA/Se change_t1t0 NaN    
 2 DGE    ACPA(citrull. Prot.-Ak) EIA/Se change_t2t0  37.6  
 3 DGE    ACPA(citrull. Prot.-Ak) EIA/Se change_t3t0 NaN    
 4 Fasten Apolipoprot. A1 HP             change_t1t0  41.2 
 5 DGE    Apolipoprot. A1 HP             change_t2t0 NaN    
 6 DGE    Apolipoprot. A1 HP             change_t3t0 NaN    
 7 DGE    Apolipoprotein B               change_t1t0 NaN    
 8 DGE    Apolipoprotein B               change_t2t0 NaN    
 9 Fasten Apolipoprotein B               change_t3t0 NaN    
10 DGE    aPTT Pathromtin SL             change_t1t0   0.571
# … with 392 more rows

This worked totally fine using this code:

#Create labels for 3 facets
lab_labels <- c("Change from Baseline to Day 7 [%]",
                "Change from Baseline to Week 6 [%]",
                "Change from Baseline to Week 12 [%]")

names(lab_labels) <- c("change_t1t0",
                       "change_t2t0",
                       "change_t3t0")

labor_summ_long %>%
  filter(parameter %in% c("Hämatokrit (l/l)","Hämoglobin", "Leukozyten","MCV", "MCH", "MCHC", "RDW-CV", "Thromobzyten","MPV")) %>%
  arrange(desc(avg))%>%
  group_by(gruppe, performance)%>%
  ggplot(aes(x=reorder(parameter,avg), y=avg, group=gruppe, fill = gruppe))+
  geom_col(position = position_dodge())+
  facet_wrap(~performance, 
             scales ="free_y", 
             dir="v",
             labeller = labeller(performance = lab_labels))+
  ylab("") + 
  xlab("") + 
  labs(color="", linetype="")+
  theme_pubclean()+
  theme(strip.background=element_rect(fill="lightgrey"),
        strip.text = element_text(face="bold"),
        legend.position = "bottom",
        legend.title=element_blank())+
  theme(axis.text.x = element_text(angle=45, hjust=1, vjust = 1))+
  scale_x_discrete(labels = c("Hämoglobin"="Hemoglobin", "Leukozyten" = "Leucocytes",
                              "MCV", "MCH", "MCHC", "RDW-CV", "Thromobzyten"="Thrombocytes",
                              "MPV", "Hämatokrit (l/l)"="Hematocrite"))+
  scale_fill_discrete(labels=c('DGE', "Fasten"='Fasting'))

This is how the plot looks like

What I am missing and am failing to find the solution to: I would like to order the bars...

  • According to the avg-value from high to low
  • of the Fasting-Group (blue bars)
  • in the performance from baseline to day 7 (change_t1t0), aka the first facet.

I tricked around with arrange, sort, etc. but couldn't get all the conditions above together.

Do you have any ideas? Thanks a lot in advance!

stefan
  • 90,330
  • 6
  • 25
  • 51
Anika
  • 29
  • 5
  • To help us to help you would you mind making your issue reproducible by sharing a sample of your **data** as a `dput()`? See [how to make a minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Simply type `dput(NAME_OF_DATASET)` into the console and copy & paste the output starting with `structure(....` into your post. If your dataset has a lot of observations you could do `dput(head(NAME_OF_DATASET, 20))` for the first twenty rows of data. – stefan Jul 24 '21 at 09:09
  • Chapeau to @stefan for the answer below. He beat me by about 5 minutes :) ... @Anika: one annoying thing of plotting data frames with ggplot is that what you see (ordered) on your screen is not the inherent order of the data items. Thus, the way to go about is to create this order. `reorder()` can become cumbersome if you have multiple conditions. You can always create a new factor column that achieves the sorting you are after ... or use a function like proposed by stefan dealing with the multiple conditions and internally creating this "factor" (in the example `byby`) inside ggplot(). – Ray Jul 24 '21 at 10:40
  • Thank you stefan and Ray for your quick and helpful response! I had not seen the other query stefan tagged before. I will try both ways (new factor column/function) and see which one I prefer :) Thank you so much!! @stefan: thank you for the tip concerning the datasat. Will do that next time! – Anika Jul 24 '21 at 12:49

1 Answers1

1

The issue is that reorder reorders by taking the mean of all values for each parameter without taking account of any grouping.

Adapting this answer to your case and making use of some random example data to mimic your real data this could be achieved like so:

The helper function reorder_where allows to order the categories by an additional condition, e.g. in your case where gruppe == "Fasten" & performance == "change_t1t0" is TRUE

library(dplyr)
library(ggplot2)

reorder_where <- function (x, by, where, fun = mean, ...) {
  xx <- x[where]
  byby <- by[where]
  byby <- tapply(byby, xx, FUN = fun, ...)[x]
  reorder(x, byby)
}

labor_summ_long %>%
  filter(parameter %in% c("Hämatokrit (l/l)","Hämoglobin", "Leukozyten","MCV", "MCH", "MCHC", "RDW-CV", "Thromobzyten","MPV")) %>%
  ggplot(aes(x=reorder_where(parameter, -avg, gruppe == "Fasten" & performance == "change_t1t0"), y=avg, group=gruppe, fill = gruppe))+
  geom_col(position = position_dodge())+
  facet_wrap(~performance, 
             scales ="free_y", 
             dir="v",
             labeller = labeller(performance = lab_labels))+
  ylab("") + 
  xlab("") + 
  labs(color="", linetype="")+
  #theme_pubclean()+
  theme(strip.background=element_rect(fill="lightgrey"),
        strip.text = element_text(face="bold"),
        legend.position = "bottom",
        legend.title=element_blank())+
  theme(axis.text.x = element_text(angle=45, hjust=1, vjust = 1))+
  scale_x_discrete(labels = c("Hämoglobin"="Hemoglobin", "Leukozyten" = "Leucocytes",
                              "MCV", "MCH", "MCHC", "RDW-CV", "Thromobzyten"="Thrombocytes",
                              "MPV", "Hämatokrit (l/l)"="Hematocrite"))+
  scale_fill_discrete(labels=c('DGE', "Fasten"='Fasting'))

DATA

set.seed(42)

labor_summ_long <- data.frame(
  parameter = sample(c("Hämatokrit (l/l)","Hämoglobin", "Leukozyten","MCV", "MCH", "MCHC", "RDW-CV", "Thromobzyten","MPV"), 100, replace = TRUE),
  gruppe = sample(c("DGE", "Fasten"), 100, replace = TRUE),
  performance = sample(c("change_t1t0",
                         "change_t2t0",
                         "change_t3t0"), 100, replace = TRUE),
  avg = runif(100, 0, 50)
)
labor_summ_long <- dplyr::distinct(labor_summ_long, parameter, gruppe, performance, .keep_all = TRUE)
stefan
  • 90,330
  • 6
  • 25
  • 51