1

Using the gtsummary package, how can I sort rows in a grouped frequency table by one of the column?

Here is an example. I have a a dataset from a survey with a multichoice question. Each answer is represented by in a variable as dichotomous. I also have a factor variable and I want a breakdown of the frequencies by that grouping variable. I use tbl_summary to summarise. I also added an overall column of the frequencies. I would like to sort the rows by the Overall column or any other column for the different groups.

some_made_up_data <- data.frame(Q1a=c(rep(c(1,0,1,1,0),20)),

                                 Q1b=c(rep(c(1,1,1,1,0),20)),

                                Q1c=c(rep(c(1,0,0,0,1),20)),

                                group=c(rep(c("g1","g2","g3","g4","g5"),20)))

  
some_made_up_data %>% tbl_summary(include = starts_with("Q1"),by=group) %>% add_overall()

I have tried to use the sort argument for tbl_summary() but that doesn't work. I have also tried to transform to a gt table using the as.gt() command but then could not sort table in gt either. Here is the only post I could find on this, but doesn't seem to help with my task. How to sort or change rows order in a сharacteristic table column in the gtsummary package? Is that even possible to do?

Werner Hertzog
  • 2,002
  • 3
  • 24
  • 36
Mauro r
  • 15
  • 4

1 Answers1

0

One option would be to use modify_table_body which could be used to order the table body. Basically the table body is a dataframe or tibble where the the overall column is stored in a column with name stat_0 and the other stats columns in stat_1 to stat_5:

library(gtsummary)

some_made_up_data %>%
  tbl_summary(include = starts_with("Q1"), by = group) %>%
  add_overall() |>
  modify_table_body(fun = ~ dplyr::arrange(.x, desc(stat_0)))

enter image description here

EDIT One option to order by the numeric value would be to convert the characters to numerics using e.g. readr::parse_number. In the example below I created some more realistic fake random example data:

library(gtsummary)

n <- 3000

set.seed(123)

some_made_up_data <- data.frame(
  Q1a = sample(c(0, 1), n, replace = TRUE, prob = c(.6, .4)),
  Q1b = sample(c(0, 1), n, replace = TRUE, prob = c(.3, .7)),
  Q1c = sample(c(0, 1), n, replace = TRUE, prob = c(.4, .6)),
  group = sample(c("g1", "g2", "g3", "g4", "g5"), n, replace = TRUE)
)

some_made_up_data %>%
  tbl_summary(include = starts_with("Q1"), by = group) %>%
  add_overall() |>
  modify_table_body(
    fun = ~ dplyr::arrange(.x, desc(readr::parse_number(stat_0)))
  )

enter image description here

stefan
  • 90,330
  • 6
  • 25
  • 51
  • Thank you so much or the solution, stefan. Yet I think that the desc() is considering the values as characters rather than numbers?as the sorting is not accurate. In my case (not the one in example here) I have a descending order from top to down 4,348 (24%), 4,075 (23%), 3,868 (22%), 10,840 (61%). The last value should be at the top. – Mauro r May 30 '23 at 21:01
  • Yup. You are right. The columns are characters. Should have realized that. One fix would be to order based on the numeric value. See my edit for an approach using `readr::parse_number`. – stefan May 30 '23 at 21:35
  • 1
    Your edit works perfectly, thank you again. I also found another way of doing it, by using stringr::str_rank() modify_table_body( fun = ~ dplyr::arrange(.x, desc(stringr::str_rank(stat_0, numeric=TRUE))) ) – Mauro r May 30 '23 at 21:41
  • Oh. Nice. Wasn't aware of this function from `stringr`. Thx for sharing. – stefan May 30 '23 at 21:46