0

I am working with secondary data within the survey package in R. I have defined the weight, strata, and cluster using the svydesign function.

mydesign <- svydesign(id=~C17SCPSU, weights=~C1_7SC0,strata=~C17SCSTR, 
                      nest=TRUE, survey.lonely.psu = "adjust", 
                      na.rm=TRUE, data=ECLSK)

There is a very small amount of missing, and these are defined as missing. However, there is NO missing data in the weight, strata, or cluster variables.

ECLSK[ECLSK == "#NULL!"] <- NA 

When I compute means on variables that have no missing, the estimates are produced great.

> svymean(~SEX_MALE, mydesign)
            mean     SE
SEX_MALE 0.51317 0.0196

However, when I compute means for variables with any missing, I get the following (only a snippet is shown).

> svymean(~C1R4MSCL, mydesign)
              mean  SE
C1R4MSCL10.63   NA NaN
C1R4MSCL11.07   NA NaN
C1R4MSCL11.36   NA NaN
C1R4MSCL11.44   NA NaN
C1R4MSCL11.65   NA NaN
C1R4MSCL11.90   NA NaN
C1R4MSCL12.00   NA NaN
C1R4MSCL12.01   NA NaN
C1R4MSCL12.04   NA NaN
C1R4MSCL12.14   NA NaN
C1R4MSCL12.18   NA NaN
C1R4MSCL12.20   NA NaN

When I completely delete any rows of data with missing from the dataframe itself and re-run, the estimates are computed fine. I have quite a few variables and want to generate estimates using complete case analysis by variable (rather than creating a new dataframe that deletes all rows that have any missing). Suggestions on how to deal with this are greatly appreciated.

Below is dput script for a small sample of the dataframe.

structure(list(CHILDID = c("0015001C", "0015014C", "0015019C", "0015020C", "0015023C", "0015025C", "0015026C", "0021001C", "0021002C", "0022002C", "0022003C", "0022006C", "0022007C", "0022008C", "0022009C", "0022012C", "0022013C", "0022014C", "0022016C", "0022017C", "0022018C", "0022019C", "0022023C", "0022024C", "0023005C", "0023011C", "0023012C", "0023015C", "0023016C", "0023017C", "0023018C", "0023019C", "0023020C", "0023021C", "0023024C", "0025001C", "0025003C", "0025005C", "0025014C", "0025016C", "0025020C", "0025021C", "0025024C", "0028002C", "0028003C", "0028005C", "0028006C", "0028007C", "0028008C", "0028009C", "0028010C", "0028011C", "0028012C", "0028013C", "0028014C", "0037001C", "0037004C", "0037008C", "0037014C", "0037016C", "0037018C", "0037021C", "0040005C", "0040007C", "0040010C", "0040011C", "0040014C", "0040016C", "0040017C", "0040018C", "0040019C", "0040020C", "0040022C", "0040023C", "0044002C", "0044006C", "0044007C", "0044008C", "0044010C", "0044011C", "0044013C", "0044016C", "0044017C", "0044022C", "0045001C", "0045003C", "0045004C", "0045005C", "0045006C", "0045007C", "0045008C", "0045010C", "0045011C", "0045014C", "0045015C", "0045017C", "0045018C", "0045020C", "0045022C", "0049002C", "0049008C", "0049010C", "0049015C", "0049017C", "0049018C", "0049020C", "0052002C", "0052003C", "0052005C", "0052006C", "0052007C", "0052008C", "0052011C", "0052012C", "0052013C", "0052014C", "0052018C", "0052019C", "0053001C", "0053002C", "0053003C", "0053005C", "0053006C", "0053007C", "0053008C", "0053009C", "0053011C", "0053012C", "0053013C", "0053014C", "0053017C", "0053018C", "0053019C", "0053021C", "0053023C", "0053024C", "0055004C", "0055010C", "0055011C", "0055012C", "0055013C", "0055014C", "0055015C", "0055016C", "0055018C", "0055022C", "0055023C", "0055024C", "0056002C", "0056003C"), C1_7SC0 = c(3159.9, 522.86, 622.73, 622.73, 714, 714, 825.03, 645.48, 634.63, 827.54, 721.76, 679.68, 827.54, 721.76, 2527.03, 827.54, 721.76, 679.68, 721.76, 709.63, 679.68, 709.63, 616.36, 679.68, 651.75, 651.75, 747.26, 747.26, 640.79, 640.79, 613.74, 640.79, 747.26, 640.79, 4613.25, 600.95, 598.77, 579.16, 609.01, 609.01, 609.01, 698.26, 598.77, 198.74, 231.77, 231.77, 1502.45, 231.77, 202.14, 202.14, 172.62, 231.77, 198.74, 198.74, 202.14, 176.04, 691.37, 592.86, 420.67, 484.91, 4611.6, 1537.83, 5579.29, 1693.28, 327.12, 5579.29, 454.63, 1357.73, 454.63, 5455.65, 454.63, 446.99, 521.26, 1357.73, 986.2, 986.2, 860.14, 860.14, 1401.07, 2318.41, 860.14, 860.14, 845.68, 845.68, 262.06, 388.14, 388.14, 445.02, 521.06, 2828.12, 445.02, 388.14, 388.14, 388.14, 445.02, 445.02, 388.14, 445.02, 365.51, 802.84, 917.88, 732.35, 12590.7, 917.88, 9657.9, 961.47, 5411.24, 205.27, 235.35, 235.35, 205.27, 235.35, 205.27, 235.35, 6588.46, 235.35, 1749.27, 205.27, 702.04, 836.42, 1018.39, 1018.39, 6441.84, 836.42, 888.21, 873.28, 888.21, 1018.39, 869.67, 1018.39, 888.21, 888.21, 1018.39, 873.28, 873.28, 873.28, 1131.98, 987.29, 987.29, 3456.7, 987.29, 929.72, 1131.98, 987.29, 1131.98, 1131.98, 6111.32, 1131.98, 436.98, 2948.91), C17SCSTR = c(26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L), C17SCPSU = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), C1R4MSCL = c("18.21", "31.20", "28.63", "37.81", "13.29", "28.78", "36.24", "24.19", "25.04", "24.67", "19.36", "25.84", "22.22", "26.56", "17.51", "16.13", "15.77", "20.98", "21.98", "15.40", "29.02", "20.65", "26.36", "28.00", "27.99", "28.61", "33.02", "31.74", "28.73", "26.32", "31.50", "30.39", "22.81", "22.07", NA, "34.27", "31.70", "25.64", "27.47", "35.99", "22.84", "21.26", "13.59", "41.16", "24.84", "52.82", "30.27", "33.97", "19.80", "28.08", "32.18", "25.98", "42.62", "29.43", "31.02", "29.53", "26.52", "18.42", "18.27", "12.57", "26.74", "32.63", "35.42", "34.76", NA, "27.98", "30.21", "20.35", "20.52", "27.34", "29.86", "26.75", "18.64", "25.80", "34.74", "93.23", "22.43", "35.76", "28.51", "21.79", "32.10", "47.15", "27.68", "35.73", "32.84", "40.46", "29.92", "32.36", "30.08", "37.57", "31.81", "35.81", "24.62", "26.17", "54.37", "52.18", "30.58", "44.87", "23.13", "16.42", "69.82", NA, "15.87", "32.53", "19.69", "14.63", "20.28", "38.89", "30.28", "38.08", "28.89", "26.27", "24.78", "27.95", "33.45", "20.43", "25.59", "24.11", "27.50", "31.27", "68.49", "39.22", "19.24", "48.78", "42.34", "49.87", "28.21", "31.25", "43.68", "19.19", "26.96", "38.70", "24.19", "30.78", "26.66", "30.28", "18.24", "32.13", "22.93", "31.89", "17.57", "28.53", "23.48", "20.57", "26.60", "68.44", "19.62", "41.77", "24.73", "29.18"), SEX_MALE = c(1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0), C3BMI = c("16.35", "19.06", "15.28", "11.96", "17.09", "15.12", "15.23", "22.76", "16.65", "16.26", "15.10", "22.64", NA, "14.71", "15.12", "18.20", "15.60", "17.82", "17.52", "16.65", "16.45", "15.12", "21.83", "15.99", "18.58", "15.97", "19.07", "16.93", "15.12", "18.87", "21.81", "15.09", "24.40", "16.16", "15.91", "16.74", "20.35", "15.73", "15.75", "17.56", "21.50", "15.33", "19.83", "16.83", "15.62", "19.43", "15.45", "15.89", "16.97", "14.47", "14.96", "18.38", "15.87", "17.95", "14.93", "15.99", "16.34", "15.28", "21.78", "14.73", "13.87", "26.63", NA, "15.79", "15.20", NA, "15.43", "18.12", "15.64", "16.21", "13.76", "16.92", "16.25", "14.95", "17.42", "15.69", "19.37", "14.16", "15.28", "18.50", "16.46", "18.15", "16.02", "18.62", "15.94", "15.03", "17.97", "18.92", "15.94", "17.98", "15.12", "14.93", "15.47", NA, "17.86", "14.94", "16.85", "15.79", NA, "16.81", "18.23", "16.67", "23.55", "19.05", "14.60", "15.20", "16.20", "13.82", "15.92", "16.06", "16.61", "18.37", "15.69", "15.08", "16.41", "14.23", "17.72", "20.54", "19.83", "16.71", "16.58", "16.64", "14.28", "17.84", "11.36", "14.79", "15.67", "16.34", "19.43", "19.88", "18.03", "15.73", "15.48", "14.08", "15.10", "16.63", "15.77", "14.27", "15.35", "17.72", "13.79", "15.03", "15.15", "14.48", "17.23", "15.11", "16.65", "14.33", "16.48", "18.27")), row.names = c(NA, 150L), class = "data.frame")

  • [See here](https://stackoverflow.com/q/5963269/5325862) on making a reproducible example that is easier for folks to help with. Is the issue just that functions based on `mean`, `sum`, etc are using `na.rm = FALSE` by default? – camille Sep 02 '21 at 03:19
  • Do you realize that your C1R4MSCL and C3BMI columns are character mode rather than numeric? – IRTFM Sep 03 '21 at 23:40

1 Answers1

0

try the next one (worked for me, however, we have different results for your first example... the "ultimate cluster" is by default, so that can not be the answer for our differences).

Basically, I define the kind of ajust for PSU lonely case in the options() and use na.rm = TRUE as part of the arguments in the function svymean().

library(survey)

mydesign <- svydesign(id=~C17SCPSU,strata=~C17SCSTR,weights=~C1_7SC0,nest=TRUE, data=ECLSK)

options(survey.lonely.psu="adjust", survey.ultimate.cluster = TRUE)

svymean(~C1R4MSCL, mydesign, na.rm = TRUE)

svymean(~SEX_MALE, mydesign, na.rm = TRUE)
Vinícius Félix
  • 8,448
  • 6
  • 16
  • 32
malvarado
  • 1
  • 2