8

I am trying to do imputation to a medium size dataframe (~100,000 rows) where 5 columns out of 30 have NAs (a large proportion, around 60%).

I tried mice with the following code:

library(mice)    
data_3 = complete(mice(data_2))

After the first iteration I got the following exception:

iter imp variable
  1   1  Existing_EMI  Loan_Amount  Loan_Period

Error in solve.default(xtx + diag(pen)): system is computationally singular: reciprocal condition number = 1.08007e-16

Is there some other package that is more robust to this kind of situations? How can I deal with this problem?

halfer
  • 19,824
  • 17
  • 99
  • 186
user8270077
  • 4,621
  • 17
  • 75
  • 140

1 Answers1

20

Your 5 columns might have a number of unbalanced factors. When these are turned into dummy variables there is a high probability that you will have one column a linear combination of another. The default imputation methods of mice involve linear regression, this results in a X matrix that cannot be inverted and will result in your error.

Change the method being used to something else like cart -- mice(data_2, method = "cart") --. Also check which seed you are calling before / during imputation for reproducible results.

My advice is to go through the 7 vignettes of mice. You can find out how to change the method of imputation being used for separate columns instead of for the whole dataset.

phiver
  • 23,048
  • 14
  • 44
  • 56