Say I have a large dataset on the populations of multiple preschools, and I want to calculate some summary data on things like mean ages within each school. The data frame is structured such that each school has a male and female population for each age from 3-5. Here's an example data set:
library(dplyr)
school <- c("Alpha", "Alpha", "Alpha", "Alpha", "Alpha", "Alpha", "Beta", "Beta", "Beta", "Beta", "Beta", "Beta")
age <- c(3, 3, 4, 4, 5, 5, 3, 3, 4, 4, 5, 5)
gender <- c("M", "F", "M", "F", "M", "F", "M", "F", "M", "F", "M", "F")
df <- data.frame(school, age, gender, pop, stringsAsFactors = TRUE)
test_df <- data.frame(School = school,
Age = age,
Gender = gender,
Population = as.integer(rnorm(n = 12, mean = 30, sd = 5)))
I've gotten as far as totaling the M and F populations for each age value with the group_by() and summarise() functions,
test_df2 <- test_df %>% group_by(School, Age) %>% summarise(Population = sum(Population))
Note: I get a warning message here:
summarise()
ungrouping output (override with.groups
argument)
but the resulting table is what I wanted, so not sure if this is important.
But then I can't seem to get from here to calculating the mean age for each school. I tried
test_df2 %>% group_by(School) %>% summarise(Mean_Age = (Age*Population/sum(Population)))
But the result isn't what I expected- it's applying the mean calculation to each age-population, and not for the entire School. I'm trying to make a table with one mean age for each school.
Sorry if I'm missing something really basic- I'm still new to r. Thanks for your help!