0

------ Short story--------

I would like to run svymean on all variables in the dataset (assuming they are all numeric). I've pulled this narrative from this guide over here: https://stylizeddata.com/how-to-use-survey-weights-in-r/

I know I can run svymean on all the variables by listing them out like this:

svymean(~age+gender, ageDesign, na.rm = TRUE)

However, my real dataset is 500 variables long (they are all numeric), and I need to get the means all at once more efficiently. I tried the following but it does not work.

svymean(~., ageDesign, na.rm = TRUE)

Any ideas's?

--------- Long explanation with real data-----

library(haven)
library(survey)
library(dplyr)
 

Import NHANES demographic data

nhanesDemo <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2015-2016/DEMO_I.XPT"))

Copy and rename variables so they are more intuitive. "fpl" is percent of the of the federal poverty level. It ranges from 0 to 5.

nhanesDemo$fpl        <- nhanesDemo$INDFMPIR
 
nhanesDemo$age        <- nhanesDemo$RIDAGEYR
 
nhanesDemo$gender     <- nhanesDemo$RIAGENDR
 
nhanesDemo$persWeight <- nhanesDemo$WTINT2YR
 
nhanesDemo$psu        <- nhanesDemo$SDMVPSU
 
nhanesDemo$strata     <- nhanesDemo$SDMVSTRA

Since there are 47 variables, we will select only the variables we will use in this analysis.

nhanesAnalysis <- nhanesDemo %>%
                    select(fpl,
                           age,
                           gender,
                           persWeight,
                           psu,
                           strata)
 

Survey Weights

Here we use "svydesign" to assign the weights. We will use this new design variable "nhanesDesign" when running our analyses.

nhanesDesign <- svydesign(id      = ~psu,
                          strata  = ~strata,
                          weights = ~persWeight,
                          nest    = TRUE,
                          data    = nhanesAnalysis)

Here we use "subset" to tell "nhanesDesign" that we want to only look at a specific subpopulation (i.e., those age between 18-79 years). This is important to do. If you don't do this and just restrict it in a different way your estimates won't have correct SEs.

ageDesign <- subset(nhanesDesign, age > 17 &
                                  age < 80)

Statistics

We will use "svymean" to calculate the population mean for age. The na.rm argument "TRUE" excludes missing values from the calculation. We see that the mean age is 45.648 and the standard error is 0.5131.

svymean(~age, ageDesign, na.rm = TRUE)

I know I can run svymean on all the variables by listing them out like this: svymean(~age+gender, ageDesign, na.rm = TRUE) However, my real dataset is 500 variables long, and I need to get the means all at once more efficiently. I tried the following but it does not work. svymean(~., ageDesign, na.rm = TRUE)

NewBee
  • 990
  • 1
  • 7
  • 26
  • just noting that `svymean( ~ var1 + var2 , design , na.rm = TRUE )` behaves like `svymean( ~ var1 + var2 , subset( design , !is.na( var1 ) & !is.na( var2 ) ) )` which might be different from `svymean( ~ var1 , design , na.rm = TRUE )` depending on the missing values in `var2` – Anthony Damico Oct 22 '20 at 16:10

1 Answers1

1

You can use reformulate to construct the formula dynamically.

library(survey)
svymean(reformulate(names(nhanesAnalysis)), ageDesign, na.rm = TRUE)

#                 mean        SE
#fpl            3.0134    0.1036
#age           45.4919    0.5273
#gender         1.5153    0.0065
#persWeight 80773.3847 5049.1504
#psu            1.5102    0.1330
#strata       126.1877    0.1506

This gives the same output as specifying each column individually in the function.

svymean(~age + fpl + gender + persWeight + psu + strata, ageDesign, na.rm = TRUE)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • is is it possible to display the variable labels at the same time? I have haven labeled variables (the real variables are titled Q50, Q20 etc,. ) so I would like to avoid attaching the name after calling svymean if possible... – NewBee Oct 22 '20 at 03:44
  • Do you mean to change the rownames from the output of `svymean` ? You can change it using `rownames(result) <- labels` – Ronak Shah Oct 22 '20 at 03:48
  • Not really, what I mean is that each variables has a label attribute such as num [1:1199] NA NA 0 NA 1 NA 0 0 NA NA ... - attr(*, "label")= chr "Q10_1.Dance assessed and reported via report card", I want this to show in the row name... – NewBee Oct 22 '20 at 04:41
  • How do you extract the label attribute from it? Can you provide an example? – Ronak Shah Oct 22 '20 at 04:44