1

I'm interested in calculating pairwise standardized mean differences(SMD) by one stratifying variable. Usually this is calculated between two groups, but can we make this calculation in 3 groups or more?

P.S. I'm a big fan of gtsummary package, so I attempted to do this analysis using example 2 from this amazing package as follows:

library(tidyverse)
library(gtsummary)
#> #BlackLivesMatter
add_difference_ex2 <-
  trial %>%
  mutate(trt=ifelse(age<40,"Drug C", trt)) %>% 
  select(trt, age, marker, grade, stage) %>%
  tbl_summary(
    by = trt,
    statistic = list(all_continuous() ~ "{mean} ({sd})"),
    missing = "no",
    include = c(age, marker, trt)
  ) %>%
  add_n() %>%
  add_difference(adj.vars = c(grade, stage))
#> 11 observations missing `trt` have been removed. To include these observations, use `forcats::fct_explicit_na()` on `trt` column before passing to `tbl_summary()`.
#> Error: 'tbl_summary'/'tbl_svysummary' object must have a `by=` value with exactly two levels

Created on 2021-10-27 by the reprex package (v2.0.1)

aynber
  • 22,380
  • 8
  • 50
  • 63
Ahmed
  • 33
  • 5
  • Have you tried using the `cobalt` package? There is a [help page](https://ngreifer.github.io/cobalt/reference/class-bal.tab.multi.html) on this and a [section](https://ngreifer.github.io/cobalt/articles/cobalt.html#using-cobalt-with-multi-category-treatments) of the vignettes dedicated to it. – Noah Oct 28 '21 at 01:50

1 Answers1

1

To add the pairwise standardized mean differences (SMD), you first need to define a function that will calculate and return the pairwise SMD estimates. Once you've done that, you can add it to the gtsummary table using the generic function add_stat(). Example Below!

library(gtsummary)
library(tidyverse)

# function to calculate pairwise smd
pairwise_smd <- function(data, variable, by, ...) {
  data <- 
    dplyr::select(data, all_of(c(variable, by))) %>%
    rlang::set_names(c("variable", "by")) %>%
    dplyr::filter(complete.cases(.)) %>%
    arrange(desc(.data$by))
  
  tibble(exclude = unique(data$by)) %>%
    mutate(
      include = map_chr(.data$exclude, ~unique(data$by) %>% setdiff(.x) %>% paste(collapse = " vs. ")),
      data_subset = 
        map(
          .data$exclude, 
          ~data %>%
            filter(!.data$by  %in% .x) %>%
            mutate(by = factor(.data$by))
        ),
      smd = map_dbl(.data$data_subset, ~smd::smd(.x$variable, .x$by)$estimate)
    ) %>%
    select(include, smd) %>%
    spread(include, smd)
}

tbl <-
  trial %>%
  select(age, grade, stage) %>%
  tbl_summary(
    by = grade,
    statistic = list(all_continuous() ~ "{mean} ({sd})"),
    missing = "no"
  ) %>%
  add_stat(fns = everything() ~ pairwise_smd)

enter image description here Created on 2021-10-27 by the reprex package (v2.0.1)

Daniel D. Sjoberg
  • 8,820
  • 2
  • 12
  • 28
  • Hi Dan, this wonderful function doesn't work on tbl_svysummary. Today I needed this and I got this error: ```There was an error for variable 'x': Error in UseMethod("select"): no applicable method for 'select' applied to an object of class "c('survey.design2', 'survey.design')"``` – Ahmed Jul 12 '22 at 18:23
  • you need to update `pairwise_smd()` to handle your survey object. get that function working before you try to incorporate into gtsummary – Daniel D. Sjoberg Jul 12 '22 at 19:50
  • Thank you for your quick reply. I'm a little bit confused about how to do that. I tried to understand how "include()" works by looking at tbl_svysummary.R, but I'm not sure how to subset variables from a `survey` object correctly. I'd appreciate a helping hand! – Ahmed Jul 12 '22 at 22:17
  • I recommend you post another issue to get assistance – Daniel D. Sjoberg Jul 12 '22 at 23:41
  • ok, thank you https://stackoverflow.com/questions/72960442/how-to-write-a-function-for-pairwise-smd-calculation-between-3-groups-or-more-us – Ahmed Jul 13 '22 at 03:02