1

I have been trying to impute a data set using the mice package using the following code,

my_imp <- mice(train, m=5, method="pmm", maxit=50)

and I got this error:

iter imp variable
  *1   1  existence.expectancy.indexError in solve.default(xtx + diag(pen)) : 
  system is computationally singular: reciprocal condition number = 3.96306e-17*

Here is a sample from my dataframe (dput). The error probably results from the existence.expectancy.index column.

structure(list(galactic.year = c(990025L, 990025L, 990025L, 990025L, 
990025L), galaxy = c("Large Magellanic Cloud (LMC)", "Camelopardalis B", 
"Virgo I", "UGC 8651 (DDO 181)", "Tucana Dwarf"), existence.expectancy.index = c(0.628656922579983, 
0.818082166933375, 0.659443179243005, 0.555861648365899, 0.991196351622249
)), class = "data.frame", row.names = c(NA, -5L))

Please give me ideas on how to solve the error.

jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • Hello and welcome to SO, could you share a sample of your data. Without that it will be very hard to find ot where the problem lies. Use can `dput()` or `dput(head())` if the data set is large. Please help us help you. – Jan Jun 09 '20 at 06:41
  • Hi, please read related Q/A: https://stackoverflow.com/a/58832614/6574038 Possible duplicate. – jay.sf Jun 09 '20 at 06:42
  • @Afrikan_patriot What is different in your case that the provided error isolation approach there won't work? – jay.sf Jun 09 '20 at 07:25
  • @Afrikan_patriot Thanks for updating. However, when you use `dput` better don't change the output when providing it. I tried to fix that in an edit to your question. If you want to `dput` a subset of your data, use e.g. `dput(dtrain[1:30, ])`. Anyway, I tried out your code and data and wasn't able to reproduce your error. Also question of my last comment might still be open. – jay.sf Jun 09 '20 at 07:45
  • @jay.sf thank you for your advice and edit – Afrikan_patriot Jun 09 '20 at 07:52
  • 1
    @jay.sf i've got the solution.The problem with using mice for imputation here is the large number of unbalanced factor variables in this dataset. When these are turned into dummy variables there is a high probability that you will have one column a linear combination of another. Since the default imputation methods involve linear regression, this results in a X matrix that cannot be inverted. One solution is to change the default imputation method to one that is not stochastic. – Afrikan_patriot Jun 09 '20 at 10:38
  • @Afrikan_patriot You may put an own answer to your question. – jay.sf Jun 09 '20 at 10:43

1 Answers1

1

The problem with using mice for imputation here is the large number of unbalanced factor variables in this dataset. When these are turned into dummy variables there is a high probability that you will have one column a linear combination of another. Since the default imputation methods involve linear regression, this results in a X matrix that cannot be inverted.

One solution is to change the default imputation method to one that is not stochastic.