I have 2 datasets. One associated with Accidents and one with the vehicles involved in the same. Now 1 accident can have more than one vehicles involved. While merging these, I need a single row of accident -> vehicle and hence, I am trying to think of a strategy to do the same. For numerical variables in vehicles, I plan to take the average (like Age of driver, engine power, etc.), however, how exactly can I merge 3-4 rows of categorical variables into 1 row? Like let's say if there are two males and two females for sex of driver, which one to choose if I am looking at frequency-based?
Asked
Active
Viewed 86 times
1
-
2Please see [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). This will help you get a good answer to your problem. You can provide your data via the output of `dput(df)` or to give a minimal subset use `dput(head(df))`. – AndrewGB Jun 10 '22 at 04:32
1 Answers
1
Something like this?
library(tidyverse)
vehicles <- tribble(
~vehicle_id, ~owner, ~age, ~sex,
1, "A", 18, "male",
2, "B", 45, "male",
3, "C", 38, "female"
)
accidents <- tribble(
~accident_id, ~vehicle_id,
1, 1,
1, 2,
2, 1,
3, 3
)
results <-
accidents %>%
left_join(vehicles) %>%
group_by(accident_id) %>%
summarise(
vehicle_ids = list(vehicle_id),
owners = list(owner),
mean_age = mean(age),
sexes = list(sex)
) %>%
mutate(
n_sexes = sexes %>% map(~ .x %>%
table() %>%
enframe() %>%
mutate(value = value %>% as.numeric()))
) %>%
unnest(n_sexes) %>%
pivot_wider(values_fill = list(value = 0))
#> Joining, by = "vehicle_id"
results
#> # A tibble: 3 × 7
#> accident_id vehicle_ids owners mean_age sexes male female
#> <dbl> <list> <list> <dbl> <list> <dbl> <dbl>
#> 1 1 <dbl [2]> <chr [2]> 31.5 <chr [2]> 2 0
#> 2 2 <dbl [1]> <chr [1]> 18 <chr [1]> 1 0
#> 3 3 <dbl [1]> <chr [1]> 38 <chr [1]> 0 1
results$owners[[1]]
#> [1] "A" "B"
Created on 2022-06-10 by the reprex package (v2.0.0)

danlooo
- 10,067
- 2
- 8
- 22