I'm currently analyzing results from a study comparing two treatment arms. All cases have several varibles in the data for concept "complication": complication1, complication2 etc. A single case can have multiple complications. All of the above complication-variables are converted to factors and factors have exact same levels.
For the results, I need a table that does a chi-squared test comparing groups with no complications. Then the table should include the total count of a specific complication for both treatment arms. As some patients get multiple complications, the total number of complications does not equal to number of patients.
With the following simple code I get pretty close to where I want to and If I wanted, I could easily do the rest manyally, but for the futures sake, I definitely would like to make a chunk to do it for me.
First the original chunk.
koe <- sappiaineisto %>% dplyr::group_by(Arm, .drop = FALSE) %>%
dplyr::count(Komplikaatio1, Komplikaatio2) %>%
dplyr::mutate(Kompl_yht = (Komplikaatio2 + n))
koe$Komplikaatio1 <-
factor(koe$Komplikaatio1, levels = (0:18))
koe$Komplikaatio1 <-
factor(koe$Komplikaatio1, labels = get_labels(sappiaineisto$Komplikaatio1, drop.unused = TRUE))
koe <- koe %>% dplyr::select(Arm, Komplikaatio1, Kompl_yht)
And a reproducible example.
#A sample data frame with somewhat similar distribution
set.seed(14)
testi <- data.frame(
Arm = as.factor(c(rep(1, 50), rep(2, 50))),
Komplikaatio1 = as.integer(c(rep(0, 40), rnorm(10, 6, 2), rep(0, 40), rnorm(10, 5, 2))),
Komplikaatio2 = as.integer(c(rep(0, 44), rnorm(6, 6, 2), rep(0, 45), rnorm(5, 5, 2))))
# A chunk to create the table
koe <- testi %>% dplyr::group_by(Arm, .drop = FALSE) %>%
# First, cases a grouped by treatment arms
dplyr::count(Komplikaatio1, Komplikaatio2) %>%
# Creates a new column "n" summarizing counts of "Komplikaatio1".
#"Komplikaatio1" consists of levels in the new table
dplyr::mutate(Kompl_yht = (Komplikaatio2 + n))
#"Komplikaatio2" includes similar information as "n" after previous chunk so a new column "Kompl_yht"
# Including the total count of cases in partical levels is created. The sum of total separate complications
The result looks like this.
Arm | Komplikaatio1 | Kompl_yht |
---|---|---|
Treatment 1 | No complication | 41 |
Treatment 1 | Delirium | 1 |
Treatment 1 | Respiratory insufficiency | 1 |
Treatment 1 | Respiratory insufficiency | 4 |
Treatment 1 | Postoperative bleeding | 3 |
Treatment 1 | Removal of a corpus alienum | 1 |
Treatment 1 | Surgical site infection | 1 |
Treatment 1 | Insufficient clinical response | 1 |
Treatment 2 | No complication | 43 |
Treatment 2 | Respiratory insufficiency | 1 |
Treatment 2 | Acute cardiac insufficiency | 1 |
Treatment 2 | Insufficient clinical response | 5 |
Numbers are already correct, but I get two separate "Respiratory insufficiencies", because some cases have both "Komplikaatio1" and "Komplikaatio" and some do not. Those would need to be converted to one number.
Intended result would have column 1 with labels of "Komplikaatio1" and columns 2 and 3 would have Treatment arms 1 and 2 separated with the count in "Kompl_yht" as the value.
I'm sure there's a simple and neat solution for this, but I just can't figure it out. And If it is a general solution in nature and not dependent on the number of "Komplikaatio"-columns, that would be highly appreciated.