Imputation MICE in R still NA left in dataset

Question

After running MICE package, the number of missing values are shrinked from 147428 to 46093 in each of the 5 complete imputation sets. But isn't it supposed to be 0 NAs instead???

Thanks!

Here is my MICE code:

imp = mice(newdata)

imputationSet1 = complete(imp)
imputationSet2 = complete(imp,2)
imputationSet3 = complete(imp,3)
imputationSet4 = complete(imp,4)
imputationSet5 = complete(imp,5)

I have a similar question at http://stackoverflow.com/questions/25472640/leftover-nas-after-imputing-using-mice, but mine has a working example. — Jameson Quinn, Aug 24 '14 at 14:33
You should provide some information on your dataset. How many variables? How many cases? What variables are these? It is likely that mice cannot fit the imputation model properly. Some cases may have not sufficient data to be imputed at all. Finally, it could be a combination of the two. — SimonG, Aug 24 '14 at 17:09
If the number of missing values is huge wrt known values, then the method might not converge at all ending up with NAs anyway! — Ehsan M. Kermani, Apr 25 '15 at 04:18

lara · Answer 1 · 2016-12-17T23:35:06.347

Ben, the mice() function detects multicollinearity, and solves the prob- lem by removing one or more predictors for the matrix. Each removal is noted in the loggedEvents element of the mids object. For example,

imp <- mice(cbind(nhanes, chl2 = 2 * nhanes$chl), print = FALSE)

imp$loggedEvents

informs us that the duplicate variable chl2 was removed before iteration. The algorithm also detects multicollinearity during iterations.

Another measure to control the algorithm is the ridge parameter. The ridge parameter is specified as an argument to mice(). Setting ridge=0.001 or ridge=0.01 makes the algorithm more robust at the expense of bias.

At the terminal node, we can apply a simple method like mice.impute.sample() that does not need any predictors for itself.

This information is taken from the book Flexible Imputation of Missing Data by Stef van Buuren, p. 129

score 5 · Answer 2 · answered Dec 17 '16 at 23:34

5

What helped me is to convert as-character variables to as-factor variables and NAs have disappeared from the imputed dataset.

answered Dec 17 '16 at 23:34

lara

176
2
6

score 2 · Answer 3 · edited Jul 03 '17 at 12:56

2

Try to handover an additional parameter called threshold, whose default is 0.999. If you set this to something closer to 1 or even larger one, your problem should disappear.

Be aware though that this issue arises only if the collinearity in the data is high.

edited Jul 03 '17 at 12:56

Davor Josipovic

5,296
1
39
57

answered Jun 10 '15 at 12:45

Phil

21
2

1

What is the name of the additional parameter? It shows as a blank space in your answer. – Paul de Barros Apr 09 '16 at 00:28

score -1 · Answer 4 · answered Jan 06 '14 at 06:55

-1

Yeah there should be no missing values left.

I bet there are some rows in your data set that are so badly mangled with missingness that mice's imputation models break down. Is it possible that there are rows in your dataset where every value is missing? That would do it.

Another thing to try on a whim - crank up the number of iterations to 15: imp = mice(newdata, maxit = 15). Does that change anything?

answered Jan 06 '14 at 06:55

Ben Ogorek

497
7
21

1

Raising the number of iterations will only work for issues with autocorrelation and non-convergence. Even 15 iterations per imputation is not much considerung what other packages do. The algorithm in `mice` is quite efficient and is surprisingly free of these issues because the (filled) missing values in the target variable aren't used to fit the model in each iteration (only those in the covariates). The number of iterations is therefore not such an issue in `mice`. Your other point I find very plausible, but I think it's not the only explanation (see comment on Q). – SimonG Aug 25 '14 at 16:04

Imputation MICE in R still NA left in dataset

4 Answers4

Linked