------ Short story--------
I would like to run svymean on all variables in the dataset (assuming they are all numeric). I've pulled this narrative from this guide over here: https://stylizeddata.com/how-to-use-survey-weights-in-r/
I know I can run svymean on all the variables by listing them out like this:
svymean(~age+gender, ageDesign, na.rm = TRUE)
However, my real dataset is 500 variables long (they are all numeric), and I need to get the means all at once more efficiently. I tried the following but it does not work.
svymean(~., ageDesign, na.rm = TRUE)
Any ideas's?
--------- Long explanation with real data-----
library(haven)
library(survey)
library(dplyr)
Import NHANES demographic data
nhanesDemo <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2015-2016/DEMO_I.XPT"))
Copy and rename variables so they are more intuitive. "fpl" is percent of the of the federal poverty level. It ranges from 0 to 5.
nhanesDemo$fpl <- nhanesDemo$INDFMPIR
nhanesDemo$age <- nhanesDemo$RIDAGEYR
nhanesDemo$gender <- nhanesDemo$RIAGENDR
nhanesDemo$persWeight <- nhanesDemo$WTINT2YR
nhanesDemo$psu <- nhanesDemo$SDMVPSU
nhanesDemo$strata <- nhanesDemo$SDMVSTRA
Since there are 47 variables, we will select only the variables we will use in this analysis.
nhanesAnalysis <- nhanesDemo %>%
select(fpl,
age,
gender,
persWeight,
psu,
strata)
Survey Weights
Here we use "svydesign" to assign the weights. We will use this new design variable "nhanesDesign" when running our analyses.
nhanesDesign <- svydesign(id = ~psu,
strata = ~strata,
weights = ~persWeight,
nest = TRUE,
data = nhanesAnalysis)
Here we use "subset" to tell "nhanesDesign" that we want to only look at a specific subpopulation (i.e., those age between 18-79 years). This is important to do. If you don't do this and just restrict it in a different way your estimates won't have correct SEs.
ageDesign <- subset(nhanesDesign, age > 17 &
age < 80)
Statistics
We will use "svymean" to calculate the population mean for age. The na.rm argument "TRUE" excludes missing values from the calculation. We see that the mean age is 45.648 and the standard error is 0.5131.
svymean(~age, ageDesign, na.rm = TRUE)
I know I can run svymean on all the variables by listing them out like this: svymean(~age+gender, ageDesign, na.rm = TRUE) However, my real dataset is 500 variables long, and I need to get the means all at once more efficiently. I tried the following but it does not work. svymean(~., ageDesign, na.rm = TRUE)