1

I want to use multiple imputation using the mice package in R to deal with missing values. I am relatively new to this and had trouble understanding mice completely. Now, I understand that usually one does not retrieve one "final" imputed dataset, but rather pooled estimates which were retrieved from the m imputed datasets. However, for my use case I want exactly that, one final dataset with values that I somehow retrieved from the mice imputations. This final dataset should then be used to assess the imputation quality and compare it to other imputation techniques.

What I did previously was just averaging the values in the m imputed datasets which I know now defeats the purpose of multiple imputation. So is there are way to do this while still keeping the principle of mice in mind?

Using this example from https://stefvanbuuren.name/fimd/workflow.html:

# mids workflow using pipes
library(magrittr)
est2 <- nhanes %>%
  mice(seed = 123, print = FALSE) %>%
  with(lm(chl ~ age + bmi + hyp)) %>%
  pool()

This code results in pooled estimates which are retrieved from the imputed datasets. Is it possible to use these estimates to retrieve an imputed dataset or is there any other way? I would like to be able to retrieve values for age, bmi and hyp I can use to impute the missing values in the dataset.

  • 3
    You might want to get statistical advice from [stats.se] first. You don't ever really get one "final" dataset from imputation. You are drawing from a random distribution and every time you do that you will get different values. – MrFlick May 31 '23 at 14:53
  • Why do you need a "final" imputed dataset? – jrcalabrese May 31 '23 at 16:52
  • Basically, I create synthetic datasets by introducing missing values into datasets which did not have missing values before. I then want to compare the original values to the values I imputed with various strategies (not just multiple imputation but also mean imputation, knn and so on) to assess the imputation quality – user13028630 Jun 01 '23 at 16:02
  • Will you conduct analyses (e.g., `lm` or something) on the imputed datasets for comparison purposes? Or are you just looking at the differences in the raw imputed values between procedures? If you used the first option, then you can just conduct analyses and pool results, [like in this diagram](https://stackoverflow.com/questions/50351736/mice-number-of-multiply-imputed-data-sets). – jrcalabrese Jun 04 '23 at 13:51

0 Answers0