I have a dataset with all natural disaster that occured over a certain time period. I would like to summarize them by year and state. When summarizing I would like to create a variable (= d_disasters) that shows me the unique types of natural disasters, e.g. for Texas, I would expect to only show Hurricane.
I am currently using dplyr:group_by and dplyr::summarize to summarize my data by year and by state & dplyr::mutate and dplyr:map_int to create new variables with the total number of natural disasters per year ($n_disasters using length) and the unique number of natural disasters ($n_distinct using n_distinct()).
Starting dataset:
structure(list(year = c(1998, 1998, 1998, 1998, 1998), country = c("US",
"US", "US", "US", "US"), state = c("Texas", "Texas", "California",
"New York", "New York"), deaths = c(12, 5, 9, 10, 18), injured = c(3,
1, 3, 5, 9), disastertype = c("Hurricane", "Hurricane", "Wild fire",
"Flood", "Epidemic")), class = "data.frame", row.names = c(NA,
-5L))
Result dataset:
structure(list(year = c(1998, 1998, 1998), state = c("California",
"New York", "Texas"), u_disastertype = c("Wild fire", "Flood, Epidemic",
"Hurricane"), disastertype = c("Wild fire", "Flood, Epidemic",
"Hurricane, Hurricane"), deaths = c(9, 28, 17), injured = c(3,
14, 4), n_distinct = c(1L, 2L, 1L), n_disasters = c(1L, 2L, 2L
)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA,
-3L), groups = structure(list(year = 1998, .rows = structure(list(
1:3), ptype = integer(0), class = c("vctrs_list_of", "vctrs_vctr",
"list"))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-1L), .drop = TRUE))
EDIT: Edited for clarification.