I have the following dataset:
my_data = structure(list(state = c("State A", "State A", "State A", "State A",
"State B", "State B", "State B", "State B", "State A", "State A",
"State A", "State A", "State B", "State B", "State B", "State B"
), city = c("city 1", "city 1", "city 2", "city 2", "city 3",
"city 3", "city 4", "city 4", "city 1", "city 1", "city 2", "city 2",
"city 3", "city 3", "city 4", "city 4"), vaccine = c("yes", "no",
"yes", "no", "yes", "no", "yes", "no", "yes", "no", "yes", "no",
"yes", "no", "yes", "no"), counts = c(1221, 2233, 1344, 887,
9862, 2122, 8772, 2341, 1221, 2233, 1344, 887, 9862, 2122, 8772,
2341), year = c(2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021,
2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022)), row.names = c(NA,
-16L), class = "data.frame")
My Question: For each city at each year, I want to find out the percent of people who took a vaccine.
The final result would look something like this (I just made up some numbers):
state city vaccine Relative_Percentage year
1 State A city 1 yes 0.6 2021
2 State A city 1 no 0.4 2021
3 State A city 2 yes 0.3 2021
4 State A city 2 no 0.7 2021
Using this post as an example (Relative frequencies / proportions with dplyr), I tried the following code:
library(dplyr)
my_data %>%
group_by(year, state, city, vaccine) %>%
summarise(n = n()) %>%
mutate(freq = n / sum(n))
But I don't think my code is correct - all percentages are exactly 0.5
`summarise()` has grouped output by 'year', 'state', 'city'. You can override using the `.groups` argument.
# A tibble: 16 x 6
# Groups: year, state, city [8]
year state city vaccine n freq
<dbl> <chr> <chr> <chr> <int> <dbl>
1 2021 State A city 1 no 1 0.5
2 2021 State A city 1 yes 1 0.5
Can someone please show me how to fix this problem?
Thanks!