I have a complex problem and I will be grateful if someone can help me out. I have a dataframe made up of appended survey data for different countries in different years. In the said dataframe, I also have air quality measures for the neighbourhoods where respondents were selected. The air quality data is from 1998 to 2016.
My problem is I want to compute the row mean (or cumulative mean exposures) for each person base on the respondents' age and the air quality data years. My data frame looks like this
dat <- data.frame(ID=c(1:2000), dob = sample(1990:2020, size=2000, replace=TRUE),
survey_year=rep(c(1998, 2006, 2008, 2014, 2019), times=80, each=5),
CNT = rep(c('AO', 'GH', 'NG', 'SL', 'UG'), times=80, each=5),
Ozone_1998=runif(2000), Ozone_1999=runif(2000), Ozone_2000=runif(2000),
Ozone_2001=runif(2000), Ozone_2002=runif(2000), Ozone_2003=runif(2000),
Ozone_2004=runif(2000), Ozone_2005=runif(2000), Ozone_2006=runif(2000),
Ozone_2007=runif(2000), Ozone_2008=runif(2000), Ozone_2009=runif(2000),
Ozone_2010=runif(2000), Ozone_2011=runif(2000), Ozone_2012=runif(2000),
Ozone_2013=runif(2000), Ozone_2014=runif(2000), Ozone_2015=runif(2000),
Ozone_2016=runif(2000))
In the example data frame above, all respondents in country Ao will have their cumulative mean air quality exposures restricted to the Ozone_1998 while respondents in country SL will have their mean calculated based on Ozone_1998 to Ozone_2014.
The next thing is for a person in country SL aged 15 years I want to their cumulative exposure to be from Ozone_2000 to Ozone_2014 (the 15 year period of their life include their birth year). A person aged 16 will have their mean from Ozone_1999 to Ozone_2014 etc.
Is their a way to do this complex task in R?
NB: Although my question is similar to another I posted (see link below), this task is much complex. I tried adapting the solution for my previous question but my attempts did not work. For instance, I tried
dat$mean_exposure = dat %>% pivot_longer(starts_with("Ozone"), names_pattern = "(.*)_(.*)", names_to = c("type", "year")) %>%
mutate(year = as.integer(year)) %>% group_by(ID) %>%
summarize(mean_under5_ozone = mean(value[ between(year, survey_year,survey_year + 0) ]), .groups = "drop")
but got an error
*Error: Problem with `summarise()` input `mean_under5_ozone`.
x `left` must be length 1
i Input `mean_under5_ozone` is `mean(value[between(year, survey_year, survey_year + 0)])`.
i The error occurred in group 1: ID = 1.*
Link to the previous question
How to compute a custom mean for each row over multiple columns, based on a row-specific criterion?
Thank you