2

I'm not sure what is the reason behind this.

I have a data set with 107 variables (mixed of numeric and factor data types) and some of them contain missing values. I use mice to impute the data.

MICE imputed data of most of all variables. However, some variable are not imputed at all.

It is very strange that while some variables are successfully imputed, some are not. I also tried running MICE just on only the variables which did not successfully imputed, this time, it was successful.

What is the reason behind this? Does it has anything to do with the number of variables in my data set? How can I fix this or do I need to run mice separately for each variable?

Many thanks,

Edited I now give out the code to replicate what I meant.

> #create data set with NAs
> iris.fake = prodNA(iris, noNA = 0.9)
> iris.fake.miss <- aggr(iris.fake)
> iris.fake.miss$missings
             Variable Count
Sepal.Length Sepal.Length   138
Sepal.Width   Sepal.Width   137
Petal.Length Petal.Length   138
Petal.Width   Petal.Width   131
Species           Species   131
> 
> #run mice
> imp = mice(iris.fake, m = 5, maxit = 5)
iter imp variable
1   1  Sepal.Width  Petal.Length  Petal.Width  Species
1   2  Sepal.Width  Petal.Length  Petal.Width  Species
1   3  Sepal.Width  Petal.Length  Petal.Width  Species
1   4  Sepal.Width  Petal.Length  Petal.Width  Species
1   5  Sepal.Width  Petal.Length  Petal.Width  Species
2   1  Sepal.Width  Petal.Length  Petal.Width  Species
2   2  Sepal.Width  Petal.Length  Petal.Width  Species
2   3  Sepal.Width  Petal.Length  Petal.Width  Species
2   4  Sepal.Width  Petal.Length  Petal.Width  Species
2   5  Sepal.Width  Petal.Length  Petal.Width  Species
3   1  Sepal.Width  Petal.Length  Petal.Width  Species
3   2  Sepal.Width  Petal.Length  Petal.Width  Species
3   3  Sepal.Width  Petal.Length  Petal.Width  Species
3   4  Sepal.Width  Petal.Length  Petal.Width  Species
3   5  Sepal.Width  Petal.Length  Petal.Width  Species
4   1  Sepal.Width  Petal.Length  Petal.Width  Species
4   2  Sepal.Width  Petal.Length  Petal.Width  Species
4   3  Sepal.Width  Petal.Length  Petal.Width  Species
4   4  Sepal.Width  Petal.Length  Petal.Width  Species
4   5  Sepal.Width  Petal.Length  Petal.Width  Species
5   1  Sepal.Width  Petal.Length  Petal.Width  Species
5   2  Sepal.Width  Petal.Length  Petal.Width  Species
5   3  Sepal.Width  Petal.Length  Petal.Width  Species
5   4  Sepal.Width  Petal.Length  Petal.Width  Species
5   5  Sepal.Width  Petal.Length  Petal.Width  Species
> summary(imp)
Multiply imputed data set
Call:
mice(data = iris.fake, m = 5, maxit = 5)
Number of multiple imputations:  5
Missing cells per column:
Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
138          137          138          131          131 
Imputation methods:
Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
"pmm"        "pmm"        "pmm"        "pmm"    "polyreg" 
VisitSequence:
Sepal.Width Petal.Length  Petal.Width      Species 
 2            3            4            5 
PredictorMatrix:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
Sepal.Length            0           0            0           0       0
Sepal.Width             0           0            1           1       1
Petal.Length            0           1            0           1       1
Petal.Width             0           1            1           0       1
Species                 0           1            1           1       0
Random generator seed value:  NA 
> 
> com = complete(imp,2)
> iris.imp.miss <- aggr(com)
> iris.imp.miss$missings
Variable Count
Sepal.Length Sepal.Length   138
Sepal.Width   Sepal.Width     0
Petal.Length Petal.Length     0
Petal.Width   Petal.Width     0
Species           Species     0
user1480478
  • 584
  • 3
  • 15
  • "do I need to run mice separately for each variable" doesn't make sense because how should the algorithm impute the values then? You should include all variables in the imputation. However, you could try a different imputation method. – Roland Jul 01 '16 at 13:29
  • Can you provide some sample data that replicates this problem? Also, you may want to look at the prediction matrix that is used for the imputation, as there may very well be an empty column in there that causes this problem. I answered a similar question [here](http://stackoverflow.com/questions/36330570/mice-imputation-failiure/36331710#36331710), although that problem seems slightly different from yours. (Also, why would you specify `method = pmm`? The function automatically determines the best methods to use, which is often not pmm.) – slamballais Jul 01 '16 at 13:33
  • I just edited my post to show example data. I have tried method = "mean", the result is even worse than pmm. More variables are not imputed. If I don't specify the method, I got an error "Error in nnet.default(X, Y, w, mask = mask, size = 0, skip = TRUE, softmax = TRUE, : too many (6513) weights". I don't have any problem if I use MissForest though. @Laterow. Roland. Yes, I would have thought it should work just fine with any number of variables. – user1480478 Jul 01 '16 at 14:36
  • The output you have shared is a bit confusing. You show some of dataset `Raw`, impute on `raw`, and dont seem to look at the output of `imputed_Data` – user20650 Jul 01 '16 at 14:40
  • 1
    @user20650 That's just a typo. I kinda change my variable name when I posted here. I edited to show the imputed data. – user1480478 Jul 01 '16 at 14:47
  • @user1480478 i meant, add sufficient data (and code) so that we can also get the same results (i.e. lacking imputation). Also, never use mean as method; in that post, OP wanted to use mean, so that's why I gave that example. Also, the lack of weights can be solved easily by specifying MaxNWts = 10000. Either way, the point of my comment was, can you check the specified prediction matrix for empty columns? If you dont know how to get the prediction matrix, check the manual. I think it just requires you to run mice with m=0, but i am not sure and cant check right now. – slamballais Jul 01 '16 at 14:48
  • If you are certain that you have used the correct dataset (Raw vs raw), and that MUAC is present in it, than you can get this behaviour if another variable is perfectly correlated with it. Perhaps also with other linear dependencies in the data (given you are using 107 variables. BTW how many observations do you have?) – user20650 Jul 01 '16 at 14:59
  • I think it has something to do with the number of missing data. I was testing with the iris data and if I have less than 90% missing this wont happen. I have 388 obs. – user1480478 Jul 01 '16 at 15:16
  • thanks for the update. okay, that is quite a small data set . what does `dim(na.omit(raw))` give you ? – user20650 Jul 01 '16 at 15:56
  • 0 107. Basically, all variables contain some NAs. – user1480478 Jul 01 '16 at 16:02
  • Same thing is happening to me. I wonder if it might occur if there is not enough information to impute that feature. – JoseOrtiz3 Aug 01 '16 at 22:22

0 Answers0