0

I'm performing predictive analysis where I train a model to a portion of my data and test the model with the remaining portion. I'm familiar with the MICE package and the imputation procedure using predictive mean matching.

My understanding is that the proper way to utilize imputation is to create numerous imputed data sets, fit a model to each of those imputed data sets, then combine the coefficients across all of those fitted models into one single model. I know how to do this and view the summary of the coefficients with which I can perform inference on the variables. However, that is not my objective; I need to end up with a single model that I can use to predict new values.

Simply put, when I try to use the predict function with this model I got from using MICE, it doesn't work.

Any suggestions? I am coding this in R.

Edit: using the airquality data set as an example, my code looks like this:

imputed_data <- mice(airquality, method = c(rep("pmm", 6)), m = 5, maxit = 5)
model <- with(imputed_data, lm(Ozone ~ Solar.R + Wind + Temp + Month + Day))
pooled_model <- pool(model)

This gives me a pooled model across my 5 imputed data sets. However, I am unable to use the predict function with this model. When I then execute:

predict(pooled_model, newdata = airquality)

I get this error:

Error in UseMethod("predict") : 
  no applicable method for 'predict' applied to an object of class "c('mira', 'matrix')"
Dan W
  • 19
  • 4
  • 1
    Hard to say when you provide no example. In cases where you have multiple models for prediction, you usually average the result of the models. Eg your model is 100 models. Predict each model average what the predictions say. That is your 1 prediction. – Oliver Feb 01 '23 at 19:16
  • If you want statistical modeling advice, you should ask for help at [stats.se] instead. If you want programming assistance, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Feb 01 '23 at 19:18
  • @Oliver see code in update. – Dan W Feb 01 '23 at 20:18
  • @MrFlick see code in update. – Dan W Feb 01 '23 at 20:19

1 Answers1

0

Not sure exactly what you're looking for, but something like this might work:

library(mice)
library(mitools)
  
data(mtcars)
mtcars$qsec[c(4,6,8,21)] <- NA
imps <- mice(mtcars, m=10)
comps <- lapply(1:imps$m, function(i)complete(imps, i))
mods <- lapply(comps, function(x)lm(qsec ~hp + drat + wt, data=x))
pmod <- MIcombine(mods)

pmod$coefficients
#> (Intercept)          hp        drat          wt 
#> 18.15389098 -0.02570887  0.11434023  0.92348390

newvals <- data.frame(hp=300, drat=4, wt=2.58)
X <- model.matrix(~hp + drat + wt, data=newvals)
preds <- X %*% pmod$coefficients
preds
#>       [,1]
#> 1 13.28118

Created on 2023-02-01 by the reprex package (v2.0.1)

DaveArmstrong
  • 18,377
  • 2
  • 13
  • 25
  • A clever way to go about it, but I guess I should have specified that I need an actual model object that I can plug into the "roc" function rather than figuring out how to extract the coefficients and do math with those. I'm not particularly interested in the actual output value coming from the prediction model. Thank you for the help though! – Dan W Feb 02 '23 at 01:41
  • Do you need prediction variances, too, or just the predicted values? – DaveArmstrong Feb 02 '23 at 12:51