2

Say I have a large dataset on the populations of multiple preschools, and I want to calculate some summary data on things like mean ages within each school. The data frame is structured such that each school has a male and female population for each age from 3-5. Here's an example data set:

library(dplyr)
school <- c("Alpha", "Alpha", "Alpha", "Alpha", "Alpha", "Alpha", "Beta", "Beta", "Beta", "Beta", "Beta", "Beta")
age <- c(3, 3, 4, 4, 5, 5, 3, 3, 4, 4, 5, 5)
gender <- c("M", "F", "M", "F", "M", "F", "M", "F", "M", "F", "M", "F") 
df <- data.frame(school, age, gender, pop, stringsAsFactors = TRUE)
test_df <- data.frame(School = school,
           Age = age,
           Gender = gender,
           Population = as.integer(rnorm(n = 12, mean = 30, sd = 5)))

I've gotten as far as totaling the M and F populations for each age value with the group_by() and summarise() functions,

test_df2 <- test_df %>% group_by(School, Age) %>% summarise(Population = sum(Population))

Note: I get a warning message here:

summarise() ungrouping output (override with .groups argument)

but the resulting table is what I wanted, so not sure if this is important.

But then I can't seem to get from here to calculating the mean age for each school. I tried

test_df2 %>% group_by(School) %>% summarise(Mean_Age = (Age*Population/sum(Population)))

But the result isn't what I expected- it's applying the mean calculation to each age-population, and not for the entire School. I'm trying to make a table with one mean age for each school.

Sorry if I'm missing something really basic- I'm still new to r. Thanks for your help!

lschoen
  • 75
  • 5

1 Answers1

2

I think what you are looking for is weighted mean, R has default function for it.

library(dplyr)
test_df2 %>% 
  group_by(School) %>% 
  summarise(Mean_Age = weighted.mean(Age, Population))

#  School Mean_Age
#  <chr>     <dbl>
#1 Alpha      3.97
#2 Beta       4.08

Without the function the formula is sum of Age * Population divide by sum of Population.

test_df2 %>% 
  group_by(School) %>% 
  summarise(Mean_Age = sum(Age * Population)/sum(Population))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213