1

I am doing a MI on a categorical variable with MICE for descriptive statistics(counts(proportion) in each level).

How can i get the pooled standard error for the proportions in each level? could this be done with pool.scalar?

What i have done:

##
data1<-nhanes2

## MI with mice
imp.data <- mice (data = data1, m = 5, maxit = 10, seed = 12345, method = "cart")

## to get all the imputed data sets into one
data2<-complete(imp.data, "long")

## get the counts for each level
counts<-count(data2$hyp)

### Average for all imputed data sets m=5

counts$n<-counts$freq/5
frikki94
  • 13
  • 3
  • Can you make your post [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? Can you explain what you mean by standard error for the counts? – jrcalabrese Feb 10 '23 at 19:50
  • Also for your reference, [Heymans and Eekhout, 2019](https://bookdown.org/mwheymans/bookmi/data-analysis-after-multiple-imputation.html#:~:text=5.2.2%20Pooling%20Means%20and%20Standard%20Deviations%20in%20R) have provided R code on how to obtain descriptive statistics for continuous data from a multiply-imputed object. The same structure of this code could likely be applied to categorical data. – jrcalabrese Feb 11 '23 at 15:44
  • @jrcalabrese i mean pooled standard error for the proportions not counts – frikki94 Feb 13 '23 at 17:41
  • Can you make your post [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? We can't access your object called `imp`. You can provide the data used to create `imp` using `dput()` or you can use another dataset like `nhanes` from the `mice` package. – jrcalabrese Feb 13 '23 at 18:22
  • @jrcalabrese it's reproducible now, sorry about that – frikki94 Feb 15 '23 at 11:09
  • What package is the function `count()` from? – jrcalabrese Feb 15 '23 at 14:26

1 Answers1

0

First, I converted hyp to 0s and 1s instead of "yes" and "no". Then I calculated the proportion per group using prop.table and prop.test from this other SO answer, and then I used this RStudio thread to calculate the standard error. Finally, I followed the pooling rules from Heymans and Eekhout (2019).

library(mice)
library(dplyr)
set.seed(12345)

data1 <- nhanes2 %>% mutate(hyp = ifelse(hyp == "no", 0, 1))
imp.data <- mice (data = data1, m = 5, maxit = 10, seed = 12345, method = "cart", printFlag = FALSE)
data2 <- complete(imp.data, "long")

pooled_vals <- with(data2, by(data2, .imp, function(x) 
  c(
  prop.table(table(x$hyp == 1)), # Proportions
  sqrt( (prop.test(table(x$hyp == 1))$estimate ) *(1 - (prop.test(table(x$hyp == 1))$estimate ) / length(x$hyp == 1) )), # SE of hyp being yes
  sqrt( (prop.test(table(x$hyp == 0))$estimate ) *(1 - (prop.test(table(x$hyp == 0))$estimate ) / length(x$hyp == 1) )) # SE of hyp being no
  )))

Reduce("+", pooled_vals)/length(pooled_vals)
     FALSE      TRUE         p         p 
0.7840000 0.2160000 0.8708825 0.4590429 
jrcalabrese
  • 2,184
  • 3
  • 10
  • 30