0

I'm working with {tidyverse} in R and I would like to do something that is somewhat complicated.

> col_vict %>% 
+     select(alcohol_involved, victim_degree_of_injury) %>%
+     mutate(alcohol_involved = as.factor(ifelse(is.na(alcohol_involved), "NO", "YES"))) %>%
+     table() %>% 
+     as.data.table() %>% 
+     group_by(victim_degree_of_injury)
# A tibble: 10 x 3
# Groups:   victim_degree_of_injury [5]
   alcohol_involved victim_degree_of_injury     N
   <chr>            <chr>                   <int>
 1 NO               complaint of pain       16516
 2 YES              complaint of pain        1331
 3 NO               killed                    168
 4 YES              killed                    122
 5 NO               no injury               22860
 6 YES              no injury                1905
 7 NO               other visible injury     4778
 8 YES              other visible injury     1102
 9 NO               severe injury             752
10 YES              severe injury             315

I would like to represent the ratio of the N of victim_degree_of_injury where alcohol_involved == YES divided by the N of victim_degree_of_injury where alcohol_involved == NO.

Here's the dput() of what I was working with:

structure(list(alcohol_involved = c("NO", "YES", "NO", "YES", 
"NO", "YES", "NO", "YES", "NO", "YES"), victim_degree_of_injury = c("complaint of pain", 
"complaint of pain", "killed", "killed", "no injury", "no injury", 
"other visible injury", "other visible injury", "severe injury", 
"severe injury"), N = c(16516L, 1331L, 168L, 122L, 22860L, 1905L, 
4778L, 1102L, 752L, 315L)), class = "data.frame", row.names = c(NA, 
-10L))
CelineDion
  • 906
  • 5
  • 21

2 Answers2

1
library(dplyr)
df %>% 
  group_by(victim_degree_of_injury) %>%
  summarize(ratio = N[alcohol_involved == "YES"] / N[alcohol_involved == "NO"])
# # A tibble: 5 x 2
#   victim_degree_of_injury  ratio
#   <chr>                    <dbl>
# 1 complaint of pain       0.0806
# 2 killed                  0.726 
# 3 no injury               0.0833
# 4 other visible injury    0.231 
# 5 severe injury           0.419 
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • 1
    Wow, that is almost exactly the same as the answer [here](https://stackoverflow.com/questions/37447977/how-to-divide-between-groups-of-rows-using-dplyr), but I guess I couldn't figure out how to adapt it to my particular circumstance. Thank you! – CelineDion Dec 11 '20 at 03:44
  • 1
    Good for you for doing research and connecting those dots! But yeah, it is exactly the same except for the names of the columns. – Gregor Thomas Dec 11 '20 at 03:47
1

In base R If the structure is maitained such that there is always a YES and a NO, then you could do

aggregate(N~victim_degree_of_injury, df[order(df$alcohol_involved),], function(x)x[2]/x[1])

  victim_degree_of_injury          N
1       complaint of pain 0.08058852
2                  killed 0.72619048
3               no injury 0.08333333
4    other visible injury 0.23064044
5           severe injury 0.41888298
Onyambu
  • 67,392
  • 3
  • 24
  • 53