How to get type 3 F values and marginal effects using a multiple imputation data set (generated from MICE)

Question

I have created an MI data set using the MICE package with 7 imputed data sets

imputeddata <- mice(distress_tibmi, m=7)

the structure of my data is now:

  ..$ id            : num [1:342] 4 8 10 11 23 32 40 47 48 56 ...
  ..$ diagnosis     : Factor w/ 2 levels "psychosis","bpd": 1 1 1 1 1 1 1 1 1 1 ...
  ..$ gender        : Factor w/ 2 levels "female","male": 1 2 2 2 2 1 1 1 1 1 ...
  ..$ distress.time : Factor w/ 2 levels "baseline","post": 1 1 1 1 1 1 1 1 1 1 ...
  ..$ distress.score: num [1:342] -2.436 -1.242 0.251 -1.54 0.549 ...
  ..$ depression    : num [1:342] 0.332 0.542 1.172 -0.298 1.172 ...
  ..$ anxiety       : num [1:342] -1.898 -0.687 0.87 -0.687 1.043 ...
  ..$ choice        : num [1:342] 6.73 2.18 2 6.45 3.55 ...
 $ imp            :List of 8
  ..$ id            :'data.frame':  0 obs. of  7 variables:
  .. ..$ 1: logi(0) 
  .. ..$ 2: logi(0) 
  .. ..$ 3: logi(0) 
  .. ..$ 4: logi(0) 
  .. ..$ 5: logi(0) 
  .. ..$ 6: logi(0) 
  .. ..$ 7: logi(0) 
  ..$ diagnosis     :'data.frame':  0 obs. of  7 variables:
  .. ..$ 1: logi(0) 
  .. ..$ 2: logi(0) 
  .. ..$ 3: logi(0) 
  .. ..$ 4: logi(0) 
  .. ..$ 5: logi(0) 
  .. ..$ 6: logi(0) 
  .. ..$ 7: logi(0) 
  ..$ gender        :'data.frame':  0 obs. of  7 variables:
  .. ..$ 1: logi(0) 
  .. ..$ 2: logi(0) 
  .. ..$ 3: logi(0) 
  .. ..$ 4: logi(0) 
  .. ..$ 5: logi(0) 
  .. ..$ 6: logi(0) 
  .. ..$ 7: logi(0) 
  ..$ distress.time :'data.frame':  0 obs. of  7 variables:
  .. ..$ 1: logi(0) 
  .. ..$ 2: logi(0) 
  .. ..$ 3: logi(0) 
  .. ..$ 4: logi(0) 
  .. ..$ 5: logi(0) 
  .. ..$ 6: logi(0) 
  .. ..$ 7: logi(0) 
  ..$ distress.score:'data.frame':  59 obs. of  7 variables:
  .. ..$ 1: num [1:59] -0.6808 -0.6448 -1.658 -0.0293 -0.3463 ...
  .. ..$ 2: num [1:59] 1.2736 0.2507 -0.0478 -0.6448 1.2736 ...
  .. ..$ 3: num [1:59] -0.681 0.848 -1.658 1.274 0.251 ...
  .. ..$ 4: num [1:59] -1.3322 -0.0478 -0.6808 -0.355 -2.4358 ...
  .. ..$ 5: num [1:59] -1.3322 -0.355 -4.8239 -0.6448 -0.0293 ...
  .. ..$ 6: num [1:59] -1.3322 0.5493 -0.0293 -2.6352 0.8478 ...
  .. ..$ 7: num [1:59] 0.5493 0.2507 1.1463 -0.0478 1.2736 ...
  ..$ depression    :'data.frame':  24 obs. of  7 variables:
  .. ..$ 1: num [1:24] -0.0882 -0.5084 -1.2966 0.542 -2.1891 ...
  .. ..$ 2: num [1:24] 0.332 0.255 1.592 0.752 0.945 ...
  .. ..$ 3: num [1:24] -2.159 0.332 -0.262 0.962 1.382 ...
  .. ..$ 4: num [1:24] -0.2621 -0.0897 -1.7689 1.1172 0.7724 ...
  .. ..$ 5: num [1:24] 0.122 -2.159 -2.399 1.462 -2.189 ...
  .. ..$ 6: num [1:24] -0.298 -0.434 -0.607 1.172 0.962 ...
  .. ..$ 7: num [1:24] 0.6 1.29 1.635 0.542 0.428 ...
  ..$ anxiety       :'data.frame':  10 obs. of  7 variables:
  .. ..$ 1: num [1:10] 0.909 -1.379 1.389 -1.268 -0.598 ...
  .. ..$ 2: num [1:10] 1.0433 -1.3789 -0.0955 -0.7655 -0.598 ...
  .. ..$ 3: num [1:10] 1.0771 -1.8979 -0.0955 -0.5138 0.0052 ...
  .. ..$ 4: num [1:10] -0.598 -1.603 0.9095 -2.608 -0.0955 ...
  .. ..$ 5: num [1:10] 0.742 0.2395 -1.7249 -2.1055 -0.0955 ...
  .. ..$ 6: num [1:10] 1.412 -0.86 1.389 -2.608 0.575 ...
  .. ..$ 7: num [1:10] 1.245 -1.033 0.909 0.909 -1.033 ...
  ..$ choice        :'data.frame':  22 obs. of  7 variables:
  .. ..$ 1: num [1:22] 4.55 3.91 7.09 4.27 3.55 ...
  .. ..$ 2: num [1:22] 8.09 5.09 5.36 4.91 4.45 ...
  .. ..$ 3: num [1:22] 4.27 7.09 3.91 3.91 7.09 ...
  .. ..$ 4: num [1:22] 5.82 6.27 7 6.82 4.73 ...
  .. ..$ 5: num [1:22] 6.18 5.36 5.36 3.18 3.18 ...
  .. ..$ 6: num [1:22] 6.18 6.73 4.73 4.73 5 ...
  .. ..$ 7: num [1:22] 5.45 7.09 7.45 3.18 4.91 ...
 $ m              : num 7
 $ where          : logi [1:342, 1:8] FALSE FALSE FALSE FALSE FALSE FALSE ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:342] "1" "2" "3" "4" ...
  .. ..$ : chr [1:8] "id" "diagnosis" "gender" "distress.time" ...
 $ blocks         :List of 8
  ..$ id            : chr "id"
  ..$ diagnosis     : chr "diagnosis"
  ..$ gender        : chr "gender"
  ..$ distress.time : chr "distress.time"
  ..$ distress.score: chr "distress.score"
  ..$ depression    : chr "depression"
  ..$ anxiety       : chr "anxiety"
  ..$ choice        : chr "choice"
  ..- attr(*, "calltype")= Named chr [1:8] "type" "type" "type" "type" ...
  .. ..- attr(*, "names")= chr [1:8] "id" "diagnosis" "gender" "distress.time" ...
 $ call           : language mice(data = distress_tibmi, m = 7)
 $ nmis           : Named int [1:8] 0 0 0 0 59 24 10 22
  ..- attr(*, "names")= chr [1:8] "id" "diagnosis" "gender" "distress.time" ...
 $ method         : Named chr [1:8] "" "" "" "" ...
  ..- attr(*, "names")= chr [1:8] "id" "diagnosis" "gender" "distress.time" ...
 $ predictorMatrix: num [1:8, 1:8] 0 1 1 1 1 1 1 1 1 0 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:8] "id" "diagnosis" "gender" "distress.time" ...
  .. ..$ : chr [1:8] "id" "diagnosis" "gender" "distress.time" ...
 $ visitSequence  : chr [1:8] "id" "diagnosis" "gender" "distress.time" ...
 $ formulas       :List of 8
  ..$ id            :Class 'formula'  language id ~ 0 + diagnosis + gender + distress.time +      distress.score + depression +  ...
  .. .. ..- attr(*, ".Environment")=<environment: 0x7ff907cd9d00> 
  ..$ diagnosis     :Class 'formula'  language diagnosis ~ 0 + id + gender + distress.time +      distress.score + depression +  ...
  .. .. ..- attr(*, ".Environment")=<environment: 0x7ff907cd9d00> 
  ..$ gender        :Class 'formula'  language gender ~ 0 + id + diagnosis + distress.time +      distress.score + depression +  ...
  .. .. ..- attr(*, ".Environment")=<environment: 0x7ff907cd9d00> 
  ..$ distress.time :Class 'formula'  language distress.time ~ 0 + id + diagnosis +      gender + distress.score + depression +  ...
  .. .. ..- attr(*, ".Environment")=<environment: 0x7ff907cd9d00> 
  ..$ distress.score:Class 'formula'  language distress.score ~ 0 + id + diagnosis +      gender + distress.time + depression +  ...
  .. .. ..- attr(*, ".Environment")=<environment: 0x7ff907cd9d00> 
  ..$ depression    :Class 'formula'  language depression ~ 0 + id + diagnosis +      gender + distress.time + distress.score +  ...
  .. .. ..- attr(*, ".Environment")=<environment: 0x7ff907cd9d00> 
  ..$ anxiety       :Class 'formula'  language anxiety ~ 0 + id + diagnosis + gender +      distress.time + distress.score +  ...
  .. .. ..- attr(*, ".Environment")=<environment: 0x7ff907cd9d00> 
  ..$ choice        :Class 'formula'  language choice ~ 0 + id + diagnosis + gender +      distress.time + distress.score +  ...
  .. .. ..- attr(*, ".Environment")=<environment: 0x7ff907cd9d00> 
 $ post           : Named chr [1:8] "" "" "" "" ...
  ..- attr(*, "names")= chr [1:8] "id" "diagnosis" "gender" "distress.time" ...
 $ blots          :List of 8
  ..$ id            : list()
  ..$ diagnosis     : list()
  ..$ gender        : list()
  ..$ distress.time : list()
  ..$ distress.score: list()
  ..$ depression    : list()
  ..$ anxiety       : list()
  ..$ choice        : list()
 $ seed           : logi NA
 $ iteration      : num 5
 $ lastSeedValue  : int [1:626] 10403 331 -1243825859 461242975 2057104913 -837414599 -54045022 1529270132 -105270003 -1459771035 ...
 $ chainMean      : num [1:8, 1:5, 1:7] NaN NaN NaN NaN -0.727 ...
  ..- attr(*, "dimnames")=List of 3
  .. ..$ : chr [1:8] "id" "diagnosis" "gender" "distress.time" ...
  .. ..$ : chr [1:5] "1" "2" "3" "4" ...
  .. ..$ : chr [1:7] "Chain 1" "Chain 2" "Chain 3" "Chain 4" ...
 $ chainVar       : num [1:8, 1:5, 1:7] NA NA NA NA 2.26 ...
  ..- attr(*, "dimnames")=List of 3
  .. ..$ : chr [1:8] "id" "diagnosis" "gender" "distress.time" ...
  .. ..$ : chr [1:5] "1" "2" "3" "4" ...
  .. ..$ : chr [1:7] "Chain 1" "Chain 2" "Chain 3" "Chain 4" ...
 $ loggedEvents   : NULL
 $ version        :Classes 'package_version', 'numeric_version'  hidden list of 1
  ..$ : int [1:3] 3 9 0
 $ date           : Date[1:1], format:  ...
 - attr(*, "class")= chr "mids"
Show in New WindowClear OutputExpand/Collapse Output
       id             diagnosis      gender   
 Min.   :  1.00   psychosis:250   female:196  
 1st Qu.: 76.75   bpd      : 92   male  :146  
 Median :198.00                               
 Mean   :215.66                               
 3rd Qu.:337.00                               
 Max.   :514.00                               

  distress.time distress.score      depression      
 baseline:171   Min.   :-4.8239   Min.   :-2.39920  
 post    :171   1st Qu.:-0.6808   1st Qu.:-0.76410  
                Median :-0.0293   Median : 0.08280  
                Mean   :-0.3083   Mean   :-0.06085  
                3rd Qu.: 0.6221   3rd Qu.: 0.77240  
                Max.   : 1.2736   Max.   : 1.80690  
                NA's   :59        NA's   :24        
    anxiety            choice      
 Min.   :-2.6080   Min.   :0.0909  
 1st Qu.:-0.9330   1st Qu.:2.4545  
 Median :-0.0955   Median :4.0454  
 Mean   :-0.1397   Mean   :3.8903  
 3rd Qu.: 0.8702   3rd Qu.:5.1136  
 Max.   : 1.7471   Max.   :8.0909  
 NA's   :10        NA's   :22      
Show in New WindowClear OutputExpand/Collapse Output


1
<dbl>
2
<dbl>
3
<dbl>
4
<dbl>
5
<dbl>
6
<dbl>
7
<dbl>
21  -0.6808 1.2736  -0.6808 -1.3322 -1.3322 -1.3322 0.5493
34  -0.6448 0.2507  0.8478  -0.0478 -0.3550 0.5493  0.2507
48  -1.6580 -0.0478 -1.6580 -0.6808 -4.8239 -0.0293 1.1463
141 -0.0293 -0.6448 1.2736  -0.3550 -0.6448 -2.6352 -0.0478
143 -0.3463 1.2736  0.2507  -2.4358 -0.0293 0.8478  1.2736
180 1.1463  -1.0065 -2.3094 -3.6124 -0.6448 -1.5403 -1.0065
181 -0.0293 -0.6808 -0.6808 -3.9381 -0.3463 -1.3322 0.2964
182 1.2736  -0.3463 0.9479  -0.0478 0.9479  -0.3463 1.1463
197 -0.3550 -0.0293 -0.6808 -0.3550 -1.3322 -4.8239 -0.6448
208 0.6221  0.2507  -0.6808 -0.3550 -0.6448 0.6221  -0.6448
1-10 of 59 rows

I created a lm with the imputed data set and summarised it using pool()

distressmodel <- with(data = imputeddata, exp = lm(distress.score ~ distress.time * diagnosis))
summary(mice::pool(distressmodel), conf.int = TRUE, conf.level = 0.95 )

however now I want to get the type 3 F values for the model, but this code is not working

car::Anova(mice::pool(distressmodel), type = 3)

it produces this error message:

Error in UseMethod("vcov") : no applicable method for 'vcov' applied to an object of class "c('mipo', 'data.frame')"

I also want to get the marginal effects of the model (eg see effects from only one level of the grouping variable which is diagnosis) which I have done successfully in my complete case analysis, but this code:

summary(margins(distressmodel, data = subset(imputeddata, diagnosis == "bpd", type = "response")))

produces this error

Error in subset_datlist(datlist = x, subset = subset, select = select, : object 'diagnosis' not found

Does anyone have any advice on alterations to the code or way to get the car::anova or margins () packages to work with an MI data set? (preferably being able to pool the results

benimwolfspelz · Answer 1 · 2021-11-02T08:12:35.770

The with(data, exp) procedure can be used to apply statistical test/models to multiple imputation outputs (mipo) only if they allow extracting the estimates with the coef method and a variance-covariance matrix with vcov. The latter seems not to work for the function car::Anova that you used.

Fortunately, there is the miceadds package, which offers procedures to conduct and pool additional statistical tests. miceadds::mi.anova seems to do exactly what you want:

miceadds::mi.anova(imputeddata, distress.score ~ distress.time * diagnosis, type=3)

I am not sure, however, about the marginal effects. In general, you can do a bit more coding and apply any statistical procedure to each imputed sample separately. Then you can pool it using the pool.scalar function. This method also gives you within-imputation, between-imputation, and total variance estimates for your pooled statistic. (And with that you can conduct a basic t-test for difference from 0, if you want.)

This approach relies on normal distribution of statistics – or on them being transformable to a normally distributed metric. (Stef van Buuren gives a list of statistics that can easily be transformed, pooled, and back-transformed here, see Table 5.2) So it should be possible for the marginal means you want, right?

I do not know the margins function you use (what package is it from?). But, if you want to get the marginal means and pool them yourself, this is the approach:

# transform your mids into a long-format data frame
imputed_l <- mice::complete(imputeddata, action="long")
nimp <- imputed_l$m #number of imputations for convenience

# create vectors to contain the marginal effects and their SEs from all seven imputations
mm_all <- vector("numeric", nimp)
mmse_all <- mm_all

# get marginal means and SEs for all imputations
for (i in 1:nimp) {
  mm_all[i] <- Expression_producing_marginal_mean(..., data = subset(imputed_l, .imp=i) )
  mmse_all[i] <- Expression_producing_SE(..., data = subset(imputed_l, .imp=i) )
}

# pool them (the U argument should be variances, so square the SEs)
mm_pool <- pool.scalar(Q=mm_all, U=mmse_all^2, n=nrow(imputed_l)/nimp)

mm_pool$qbar #marginal mean aggregated across imputations
sqrt(mm_pool$t) #SE of marginal mean (based on within- and between-imputations variance)

How to get type 3 F values and marginal effects using a multiple imputation data set (generated from MICE)

1 Answers1