1

I am roughly reproducing this code:

library(haven)
library(survey)
library(dplyr)

nhanesDemo <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2015-2016/DEMO_I.XPT"))

# Rename variables into something more readable
nhanesDemo$fpl <- nhanesDemo$INDFMPIR
nhanesDemo$age <- nhanesDemo$RIDAGEYR
nhanesDemo$gender <- nhanesDemo$RIAGENDR
nhanesDemo$persWeight <- nhanesDemo$WTINT2YR
nhanesDemo$psu <- nhanesDemo$SDMVPSU
nhanesDemo$strata <- nhanesDemo$SDMVSTRA

# Select the necessary columns
nhanesAnalysis <- nhanesDemo %>%
  select(fpl, age, gender, persWeight, psu, strata)

# Set up the design
nhanesDesign <- svydesign(id      = ~psu,
                          strata  = ~strata,
                          weights = ~persWeight,
                          nest    = TRUE,
                          data    = nhanesAnalysis)

# Select those between the agest of 18 and 79
ageDesign <- subset(nhanesDesign, age > 17 & age < 80)

They calculate a simple arithmetic mean as follows:

# Arithmetic mean
svymean(~age, ageDesign, na.rm = TRUE)

I would like to calculate (1) the geometric mean using svymean or some related method (2) with a 95% confidence interval of the geometric mean without having to manually use the SE, if possible. I tried

svymean(~log(age), log(ageDesign), na.rm = TRUE)

but that throws an error. How do I do this?

Forklift17
  • 2,245
  • 3
  • 20
  • 32
  • 1
    this code is a bit dated and you already have a reply from Dr. Lumley - the authority on this sort of question - but readers might benefit from the detailed NHANES geometric mean example here: https://github.com/ajdamico/asdfree/blob/archive/National%20Health%20and%20Nutrition%20Examination%20Survey/replicate%20human%20exposure%20to%20chemicals%20report.R – Anthony Damico Feb 03 '23 at 07:03

1 Answers1

3

You want to take the logarithm of the variable, not of the whole survey design (which doesn't make any sense)

Here's how to compute the mean of log(age), get a confidence interval, and then exponentiate it to the geometric mean scale

> svymean(~log(age), ageDesign, na.rm = TRUE)
           mean     SE
log(age) 3.7489 0.0115
> meanlog<-svymean(~log(age), ageDesign, na.rm = TRUE)
> confint(meanlog)
            2.5 %   97.5 %
log(age) 3.726351 3.771372
> exp(confint(meanlog))
            2.5 %   97.5 %
log(age) 41.52729 43.43963

Or, you can exponentiate first and then construct the confidence interval:

> (geomean<-svycontrast(meanlog, quote(exp(`log(age)`))))
          nlcon     SE
contrast 42.473 0.4878
> confint(geomean)
            2.5 %   97.5 %
contrast 41.51661 43.42879

I would expect the first approach to give better confidence intervals in general, but it makes almost no difference here.

Thomas Lumley
  • 1,893
  • 5
  • 8