Using R MICE to impute missing data and calculate sums

Question

This is my first time trying to use mice in R. I have a cross-sectional dataset with responses for several scales. I would like to calculate the sums of several scales after imputing missing data.

I would like to adapt the code provided in R - mice - adding a column that sums columns with imputed values, which is based on responses posted in a post on how to Perform operation on each imputed dataset in R's MICE. However, I am not certain what each line in this section of code does:

pred1 <- ini$pred  
#change the predictor matrix so only bdd$EE1-9 predicts EE (necessary?)
pred1[ "EE", ] <- 0 
pred1[ "EE", c("EE1", "EE2", "EE3", "EE4", "EE5", "EE6", "EE7", "EE8", "EE9")] <- 1
#change the predictor matrix so that EE isnt used to predict
pred1[ , "EE" ] <- 0

I think that the first line accesses the prediction matrix in the mids df generated when executing mice(.). Next, lines 2 and 3 change the matrix so that only the variables included in the ~I() statement are used to impute/calculate the sums. I think this is not required? Please confirm. Finally, line 4 changes the matrix so that the sum is not used to impute other variables. This is required? Please confirm

Can you either confirm that I understand the function of this code or provide a line-by-line explanation of its function? Thanks in advance!

I want to understand this section of code before using it in my project. After I finish my analyses, I will share my code :) Hopefully, others will find it helpful! — Cassondra Lyman, Aug 11 '22 at 16:31

score 0 · Answer 1 · answered Aug 23 '22 at 18:00

Your understanding is correct.

pred1 <- ini$pred extracts the initial predictor matrix created by mice(). The predictor matrix is a P by P matrix, where a 1 in the ith row and jth column indicates that the jth variable is used as a predictor for the ith variable.

pred1[ "EE", ] <- 0 removes all predictors from the EE variable. pred1[ "EE", c("EE1", "EE2", "EE3", "EE4", "EE5", "EE6", "EE7", "EE8", "EE9")] <- 1 adds EE1-9 as predictors for the EE variable. Neither of these steps are necessary since passive imputation is being used. But, there's no harm in leaving it in.

pred1[ , "EE" ] <- 0 removes EE as a predictor from any other variables in the data. Whether this is "good" is a statistical question, and depends on the relation between EE and any other missing variables. It is definitely a good idea to remove EE as a predictor of EE1-9 (although I don't think it's strictly necessary). But, it could be that EE is useful for imputing some other variables in the data. In that case, you would want to ensure that EE and EE1-9 are not both supplied as predictors for those variables.

Thank you! I really appreciate the confirmation and explanation :) — Cassondra Lyman, Aug 24 '22 at 19:17

score 0 · Answer 2 · answered Aug 22 '23 at 12:57

0

That is indeed correct. FWIW, there's now a visualization function in the ggmice package to evaluate and verify your predictor matrix interpretation https://amices.org/ggmice/reference/plot_pred

answered Aug 22 '23 at 12:57

hanne

36
4

While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - [From Review](/review/late-answers/34880821) – Ram Chander Aug 25 '23 at 14:47

Using R MICE to impute missing data and calculate sums

2 Answers2