4

I have a dataset with about 12 categorical variables with levels ranging from 2 - 10, as well as other numerical variables. About 280 records. I'm using the mice package in r to perform imputation on the missing data with all default settings. However, when I try to do the imputation like this:

imp <- mice(df)

I continue to get this warning:

glm.fit: algorithm did not converge

The solutions I found online here and here only focus on using the glm function directly, but in my case, it's a function that's called from within mice. I've tried setting maxit = 50, like this

imp <- mice(df, maxit = 50)

but only ended up getting many more instances of the same warning. Any idea what could be causing this?

ayePete
  • 411
  • 1
  • 7
  • 23

3 Answers3

5

I have decided to post the answer to my question just to show how I solved it, which is a bit unusual, and given that none of the solutions I could find online worked in my case.

I realized that the warning actually comes from the logreg function (for categorical variables with only 2 levels), rather than from polyreg. So, given that the glm.fit() function is called not just from within mice, but from within logreg, I ended up finding the mice code on Github, copying the logreg function, editing the glm.fit() call with the control parameter for maxit, renaming it as specified in the ?mice 'Details' section, and using that. Worked fine (after some more debugging, lol), and the algorithm now converges.

ayePete
  • 411
  • 1
  • 7
  • 23
  • 1
    Having this same issue - do you have a link to your fix? I don't konw which branch on github is yours – user2498193 Feb 18 '20 at 19:51
  • 3
    Here you go: https://github.com/ayePete/missing_data. You would need to download the code into your working directory, and then, in your `mice` function call, add this argument: `defaultMethod = c("pmm", "logreg_2", "ployreg").` – ayePete Feb 19 '20 at 10:40
3

mice() internally applies regression analysis for each variable that is to impute and chooses by default automatically a method according to the data type. So, for your categorical variables it selects a polytomous logistic method, that uses glm.fit(), and that did not converge.

To simply remove the error, you could set method="pmm" (predictive mean matching) for all variables, or convert categorical variables into numeric beforehand. However, this could lead to wrong results and I strongly recommend to overthink your imputation approach and examine why the algorithms won't converge.

jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • Thanks for the reply. I know that the warning originates from the polytomous logistic method. I also read somewhere that one way to deal with this error is to increase the number of iterations for `glm.fit()`. I'm wondering if there's any way to do this from the `mice` interface? I've read through Stef van Buuren's [paper](https://stat.ethz.ch/education/semesters/ss2012/ams/paper/mice.pdf) describing the package, but can't find anything in that line. – ayePete Nov 13 '19 at 10:53
  • 1
    @ayePete You could try to provide an own imputation method for your categorical variables as it is described in `?mice` section "Details" first chapter under the methods list. Something `glm.fit(control = list(glm.control(...)))`, replace the `...` with custom values, look into `?glm.fit` and `?glm.control`. – jay.sf Nov 13 '19 at 11:05
  • 1
    This is a very good hint from jay.sf. If you think you have better parameter settings for underlying algorithms, a lot of packages enable you to forward parameters to the underlying functions. You can identify if this is possible from the ... parameter in the documentation. E.g. for mice the ... parameter in the documentation states: "Named arguments that are passed down to the univariate imputation functions.". As jay.sf describes, you just write the parameter you would specify in glm.fit as additional parameter to your mice() call. – Steffen Moritz Nov 15 '19 at 16:57
  • Thanks, @stats0007. I actually wasn't sure how to go about adding a new parameter (i.e., still didn't understand the hints, lol.), and not also sure if this is going to override the value of the current one. – ayePete Feb 04 '20 at 11:27
1

I experienced a similar error, and the issue was coming from the fact that some variables were completely collinear in the predictorMatrix. The model build by mice was unidentifiable; that was the problem for me. Posting here for googler's sake to double-check the predictor matrix, that for example, dummy variables are not collinear. Removing one of the levels allows the logreg method to work fine.