How to divide values in another column across two different factors of a group?

Question

I'm working with {tidyverse} in R and I would like to do something that is somewhat complicated.

> col_vict %>% 
+     select(alcohol_involved, victim_degree_of_injury) %>%
+     mutate(alcohol_involved = as.factor(ifelse(is.na(alcohol_involved), "NO", "YES"))) %>%
+     table() %>% 
+     as.data.table() %>% 
+     group_by(victim_degree_of_injury)
# A tibble: 10 x 3
# Groups:   victim_degree_of_injury [5]
   alcohol_involved victim_degree_of_injury     N
   <chr>            <chr>                   <int>
 1 NO               complaint of pain       16516
 2 YES              complaint of pain        1331
 3 NO               killed                    168
 4 YES              killed                    122
 5 NO               no injury               22860
 6 YES              no injury                1905
 7 NO               other visible injury     4778
 8 YES              other visible injury     1102
 9 NO               severe injury             752
10 YES              severe injury             315

I would like to represent the ratio of the N of victim_degree_of_injury where alcohol_involved == YES divided by the N of victim_degree_of_injury where alcohol_involved == NO.

Here's the dput() of what I was working with:

structure(list(alcohol_involved = c("NO", "YES", "NO", "YES", 
"NO", "YES", "NO", "YES", "NO", "YES"), victim_degree_of_injury = c("complaint of pain", 
"complaint of pain", "killed", "killed", "no injury", "no injury", 
"other visible injury", "other visible injury", "severe injury", 
"severe injury"), N = c(16516L, 1331L, 168L, 122L, 22860L, 1905L, 
4778L, 1102L, 752L, 315L)), class = "data.frame", row.names = c(NA, 
-10L))

score 1 · Accepted Answer · answered Dec 11 '20 at 03:41

1

library(dplyr)
df %>% 
  group_by(victim_degree_of_injury) %>%
  summarize(ratio = N[alcohol_involved == "YES"] / N[alcohol_involved == "NO"])
# # A tibble: 5 x 2
#   victim_degree_of_injury  ratio
#   <chr>                    <dbl>
# 1 complaint of pain       0.0806
# 2 killed                  0.726 
# 3 no injury               0.0833
# 4 other visible injury    0.231 
# 5 severe injury           0.419

answered Dec 11 '20 at 03:41

Gregor Thomas

136,190
20
167
294

1

Wow, that is almost exactly the same as the answer [here](https://stackoverflow.com/questions/37447977/how-to-divide-between-groups-of-rows-using-dplyr), but I guess I couldn't figure out how to adapt it to my particular circumstance. Thank you! – CelineDion Dec 11 '20 at 03:44
1

Good for you for doing research and connecting those dots! But yeah, it is exactly the same except for the names of the columns. – Gregor Thomas Dec 11 '20 at 03:47

score 1 · Answer 2 · answered Dec 11 '20 at 03:46

In base R If the structure is maitained such that there is always a YES and a NO, then you could do

aggregate(N~victim_degree_of_injury, df[order(df$alcohol_involved),], function(x)x[2]/x[1])

  victim_degree_of_injury          N
1       complaint of pain 0.08058852
2                  killed 0.72619048
3               no injury 0.08333333
4    other visible injury 0.23064044
5           severe injury 0.41888298

How to divide values in another column across two different factors of a group?

2 Answers2