1

I have 2 datasets. One associated with Accidents and one with the vehicles involved in the same. Now 1 accident can have more than one vehicles involved. While merging these, I need a single row of accident -> vehicle and hence, I am trying to think of a strategy to do the same. For numerical variables in vehicles, I plan to take the average (like Age of driver, engine power, etc.), however, how exactly can I merge 3-4 rows of categorical variables into 1 row? Like let's say if there are two males and two females for sex of driver, which one to choose if I am looking at frequency-based?

Sunderam Dubey
  • 1
  • 11
  • 20
  • 40
addd
  • 77
  • 1
  • 6
  • 2
    Please see [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). This will help you get a good answer to your problem. You can provide your data via the output of `dput(df)` or to give a minimal subset use `dput(head(df))`. – AndrewGB Jun 10 '22 at 04:32

1 Answers1

1

Something like this?

library(tidyverse)

vehicles <- tribble(
  ~vehicle_id, ~owner, ~age, ~sex,
  1, "A", 18, "male",
  2, "B", 45, "male",
  3, "C", 38, "female"
)

accidents <- tribble(
  ~accident_id, ~vehicle_id,
  1, 1,
  1, 2,
  2, 1,
  3, 3
)

results <-
  accidents %>%
  left_join(vehicles) %>%
  group_by(accident_id) %>%
  summarise(
    vehicle_ids = list(vehicle_id),
    owners = list(owner),
    mean_age = mean(age),
    sexes = list(sex)
  ) %>%
  mutate(
    n_sexes = sexes %>% map(~ .x %>%
      table() %>%
      enframe() %>%
      mutate(value = value %>% as.numeric()))
  ) %>%
  unnest(n_sexes) %>%
  pivot_wider(values_fill = list(value = 0))
#> Joining, by = "vehicle_id"

results
#> # A tibble: 3 × 7
#>   accident_id vehicle_ids owners    mean_age sexes      male female
#>         <dbl> <list>      <list>       <dbl> <list>    <dbl>  <dbl>
#> 1           1 <dbl [2]>   <chr [2]>     31.5 <chr [2]>     2      0
#> 2           2 <dbl [1]>   <chr [1]>     18   <chr [1]>     1      0
#> 3           3 <dbl [1]>   <chr [1]>     38   <chr [1]>     0      1

results$owners[[1]]
#> [1] "A" "B"

Created on 2022-06-10 by the reprex package (v2.0.0)

danlooo
  • 10,067
  • 2
  • 8
  • 22