Combining counts of several similar nominal variables and forming a tidy table

Question

I'm currently analyzing results from a study comparing two treatment arms. All cases have several varibles in the data for concept "complication": complication1, complication2 etc. A single case can have multiple complications. All of the above complication-variables are converted to factors and factors have exact same levels.

For the results, I need a table that does a chi-squared test comparing groups with no complications. Then the table should include the total count of a specific complication for both treatment arms. As some patients get multiple complications, the total number of complications does not equal to number of patients.

With the following simple code I get pretty close to where I want to and If I wanted, I could easily do the rest manyally, but for the futures sake, I definitely would like to make a chunk to do it for me.

First the original chunk.

koe <- sappiaineisto %>% dplyr::group_by(Arm, .drop = FALSE) %>% 
  dplyr::count(Komplikaatio1, Komplikaatio2) %>%
  dplyr::mutate(Kompl_yht = (Komplikaatio2 + n))
koe$Komplikaatio1 <-
  factor(koe$Komplikaatio1, levels = (0:18))
koe$Komplikaatio1 <-
  factor(koe$Komplikaatio1, labels = get_labels(sappiaineisto$Komplikaatio1, drop.unused = TRUE))
koe <- koe %>% dplyr::select(Arm, Komplikaatio1, Kompl_yht)

And a reproducible example.

#A sample data frame with somewhat similar distribution
set.seed(14)
testi <- data.frame(
  Arm = as.factor(c(rep(1, 50), rep(2, 50))),
  Komplikaatio1 = as.integer(c(rep(0, 40), rnorm(10, 6, 2), rep(0, 40), rnorm(10, 5, 2))),
  Komplikaatio2 = as.integer(c(rep(0, 44), rnorm(6, 6, 2), rep(0, 45), rnorm(5, 5, 2))))

# A chunk to create the table
koe <- testi %>% dplyr::group_by(Arm, .drop = FALSE) %>% 
# First, cases a grouped by treatment arms
  dplyr::count(Komplikaatio1, Komplikaatio2) %>% 
  
# Creates a new column "n" summarizing counts of "Komplikaatio1".
#"Komplikaatio1" consists of levels in the new table

  dplyr::mutate(Kompl_yht = (Komplikaatio2 + n))
#"Komplikaatio2" includes similar information as "n" after previous chunk so a new column "Kompl_yht"
# Including the total count of cases in partical levels is created. The sum of total separate complications

The result looks like this.

Arm	Komplikaatio1	Kompl_yht
Treatment 1	No complication	41
Treatment 1	Delirium	1
Treatment 1	Respiratory insufficiency	1
Treatment 1	Respiratory insufficiency	4
Treatment 1	Postoperative bleeding	3
Treatment 1	Removal of a corpus alienum	1
Treatment 1	Surgical site infection	1
Treatment 1	Insufficient clinical response	1
Treatment 2	No complication	43
Treatment 2	Respiratory insufficiency	1
Treatment 2	Acute cardiac insufficiency	1
Treatment 2	Insufficient clinical response	5

Numbers are already correct, but I get two separate "Respiratory insufficiencies", because some cases have both "Komplikaatio1" and "Komplikaatio" and some do not. Those would need to be converted to one number.

Intended result would have column 1 with labels of "Komplikaatio1" and columns 2 and 3 would have Treatment arms 1 and 2 separated with the count in "Kompl_yht" as the value.

I'm sure there's a simple and neat solution for this, but I just can't figure it out. And If it is a general solution in nature and not dependent on the number of "Komplikaatio"-columns, that would be highly appreciated.

Lauri, I think you need to abstract more what your problem really is. You talk about chi-square tests but they're not in your code, your code cannot be run and even using it to understand your question, it simply contains too much unexplained stuff (why `(Komplikaatio2 + n)`? have a look [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) at how to create a minimal reproducible example and you're more likely to get help — Fons MA, Jan 27 '21 at 11:40
Thanks! Sorry about that. Chi-square was not included as the table is not in the phase where that calculation would be made. I wouldn't worry too much about that. It is the easy part, I think. But the included chunk of code behaves completely similarly as my own chunk so it should be reproducible now. — Lauri Pautola, Jan 27 '21 at 12:33

score 0 · Accepted Answer · answered Jan 27 '21 at 13:40

It could be that just an extra step is needed:

library(tidyverse)

tb %>% pivot_wider(names_from = Arm, values_from = Kompl_yht, values_fn = sum)

# A tibble: 8 x 3
  Komplikaatio1                  `Treatment 1` `Treatment 2`
  <chr>                                  <dbl>         <dbl>
1 No complication                           41            43
2 Delirium                                   1            NA
3 Respiratory insufficiency                  5             1
4 Postoperative bleeding                     3            NA
5 Removal of a corpus alienum                1            NA
6 Surgical site infection                    1            NA
7 Insufficient clinical response             1             5
8 Acute cardiac insufficiency               NA             1

tb is just the last table:

# A tibble: 12 x 3
   Arm         Komplikaatio1                  Kompl_yht
   <chr>       <chr>                              <dbl>
 1 Treatment 1 No complication                       41
 2 Treatment 1 Delirium                               1
 3 Treatment 1 Respiratory insufficiency              1
 4 Treatment 1 Respiratory insufficiency              4
 5 Treatment 1 Postoperative bleeding                 3
 6 Treatment 1 Removal of a corpus alienum            1
 7 Treatment 1 Surgical site infection                1
 8 Treatment 1 Insufficient clinical response         1
 9 Treatment 2 No complication                       43
10 Treatment 2 Respiratory insufficiency              1
11 Treatment 2 Acute cardiac insufficiency            1
12 Treatment 2 Insufficient clinical response         5

Yes! Many thanks, this solved it. I knew, there was something simple, but coundn't just come up with the idea, what to use. First time I encountered that functionality, but will remember it. Thanks! — Lauri Pautola, Jan 27 '21 at 14:37
Glad to know it solved your problem. Please accept it by clicking the check mark beside the answer to toggle it from grey to green. Thanks. — nyk, Jan 27 '21 at 22:14

Combining counts of several similar nominal variables and forming a tidy table

1 Answers1