Tidy data with frequency weights in rows

Question

I have collected non-tidy data from different studies. Assume this data is about studies that report on a number patients having treatment_x and with a treatment outcome in percentages and an x number of recurrences of the treated disease.

library("tidyverse", "gtsummary")

data <- data.frame(
    study_id = c(1, 2, 3, 4, 5),
    no_patients = c(10, 15, 20, 23, 16),
    treatment_id = c("surgery", "radiotherapy", "surgery", "radiotherapy", "surgery"), 
    treatment_outcome = c(0.88, 0.50, 0.90, 0.23, 0.67),
    recurrence = c(0, 2, 4, 3, 6)
)

I want to report this data in a table a compare the different treatment methods

data %>% 
    select(-study_id) %>%
    tbl_summary(by = treatment_id, 
    type = list(no_patients ~ "continuous", recurrence ~ "continuous", treatment_outcome ~ "continuous"),
    statistic = list(
      c("no_patients") ~ "{sum}"
    )) %>%
    add_p()

enter image description here

As the different studies report on different no of patients, they are not of equal importance and study 4 with 23 patients be counted heavier than the other studies with a lower amount of patients.

I could make this table a long format:

data.long <- data[rep(row.names(data), data$no_patients), ]

Now the data has a row per patient, however the total amount of recurrences are attributed to each individual patient. I could also divide the column with recurrences by the amount of patients. However, my actual dataset is way more complicated and has a much higher number of variables.

My questions:

is there an easier way to assign a weight to a row dependent on the no_patients, so this will be account for in the gtsummary table
has anyone used the new frequency_weights() https://www.tidyverse.org/blog/2022/05/case-weights/

on a sidenote: `library("a", "b")` loads only the first package. See here how to load multiple packages in one go: https://stackoverflow.com/questions/8175912/load-multiple-packages-at-once — I_O, May 27 '23 at 15:52
I get the idea that this “tabulation”. Is based on longer data. If so you should not collapse the table. If not so then you will have a difficult time restoring the original correlations among variables that have been summed or averaged. — IRTFM, May 27 '23 at 16:51
@IRTFM: Unfortunately the data is not based on longer data. I guessed the best I can do is used weighted outcomes/characteristics to try and find the correlations. Since they will regress to the mean, they will be less strong... — JLA, May 27 '23 at 17:49

I_O · Answer 1 · 2023-05-27T15:53:45.010

1

You could group_by treatment_id and use prop.table to obtain groupwise weights:

library(dplyr)

data |>
  group_by(treatment_id) |>
  mutate(treatment_outcome_weighted = treatment_outcome *
           prop.table(no_patients),
         )

edited May 27 '23 at 15:53

answered May 27 '23 at 15:47

I_O

4,983
2
2
15

This gives me the weights in the table. However, my problem is how to use the weights in a table. I can change the table to 1 patient case per row and convert the sums (like recurrence) to means: `data.long <- data[rep(row.names(data), data$no_patients), ] %>% mutate(recurrence_divided = recurrence/no_patients) %>% select(-study_id, -recurrence, -no_patients) %>% tbl_summary(by = treatment_id, type = list(recurrence_divided ~ "continuous", treatment_outcome ~ "continuous")) %>% add_p()` p-values are incorrect though – JLA May 28 '23 at 05:37

Tidy data with frequency weights in rows

1 Answers1