2

I need to add variables to imputed data sets built using mice() and then use as.mids() to reassemble them into a mids object for later analysis. However, when I use complete() on the rebuilt mids object, I find that many of the values in the new variable added to the dataset have become NA.

library(mice)
d1 = as.data.frame(matrix(rnorm(100), nrow = 10))
missingness = matrix(as.logical(rbinom(100,1,.2)), ncol = 10)
d1[which(missingness, arr.ind = T)] = NA     #replace some values with NA
d.mids = mice(d1, printFlag = F)             #make the imputations
d.long = complete(d.mids, "long", T)         #extract the original dataset and the imputed ones
added = data.frame(rowSums(d.long[,3:12]))   #make a new column
d.long.aug = cbind(d.long,added)             #add it to the data.frame
d.remids = as.mids(d.long.aug)               #turn it back into a mids object
d.relong = complete(d.remids,"long",T)       #extract it from the mids object
sum(is.na(d.long.aug[11:30,13]))             #0, unless a variable failed to impute due to collinearity
sum(is.na(d.relong[11:30,13]))               #should be the same as  previous value, but almost never is

In the above example, I created a new long data.frame and applied as.mdids() to it, but I get the same results if I use cbind to add the new variable to d.long, or if I assign the new variable to d.long$added.

How can I make sure that the values in the new variable stay there after I reassemble the mids object?

Paul de Barros
  • 1,170
  • 8
  • 22
  • Did you find a solution to this ? I'm having the same problem. I also tried the alternative method suggested in the answers here: http://stackoverflow.com/questions/26667162/perform-operation-on-each-imputed-dataset-in-rs-mice - but that created other problems – user2498193 Jul 13 '16 at 23:02
  • 1
    Actually I fixed it. In my case the operaction I was carrying out to make the new column had a silent bug that was throwing out NA's on occasion! – user2498193 Jul 14 '16 at 09:57
  • 1
    I'm glad you found a solution. I believe that my solution was to add the variables before imputation, and use passive imputation to calculate them. I also had to make sure that the imputation of other variables was not based on the added ones. If I had to go back and do it again, I'd pick a different package like mi or Amelia. – Paul de Barros Jul 14 '16 at 10:02

0 Answers0