12

How can I perform an operation (like subsetting or adding a calculated column) on each imputed dataset in an object of class mids from R's package mice? I would like the result to still be a mids object.

Edit: Example

library(mice)
data(nhanes)

# create imputed datasets
imput = mice(nhanes)

The imputed datasets are stored as a list of lists

imput$imp

where there are rows only for the observations with imputation for the given variable.

The original (incomplete) dataset is stored here:

imput$data

For example, how would I create a new variable calculated as chl/2 in each of the imputed datasets, yielding a new mids object?

J0e3gan
  • 8,740
  • 10
  • 53
  • 80
half-pass
  • 1,851
  • 4
  • 22
  • 33
  • It would be easier if you took the time to create a [minimal, reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) so that we can offer specific code suggestions. This is a bit too broad and non-specific as is. – MrFlick Oct 31 '14 at 03:43
  • @user20650, it does store the original dataset in `imput$data`, but it's separate from the imputed datasets. I just added an example with this. – half-pass Oct 31 '14 at 04:01
  • If you want to generate `chl/2` you can calculate this before the imputation. Than when doing the imputation you add the restriction that when any imputation of missing for this column equals `chl/2` – user20650 Oct 31 '14 at 04:02

4 Answers4

14

This can be done easily as follows -

Use complete() to convert a mids object to a long-format data.frame:

 long1 <- complete(midsobj1, action='long', include=TRUE)

Perform whatever manipulations needed:

 long1$new.var <- long1$chl/2
 long2 <- subset(long1, age >= 5)

use as.mids() to convert back manipulated data to mids object:

 midsobj2 <- as.mids(long2)

Now you can use midsobj2 as required. Note that the include=TRUE (used to include the original data with missing values) is needed for as.mids() to compress the long-formatted data properly. Note that prior to mice v2.25 there was a bug in the as.mids() function (see this post https://stats.stackexchange.com/a/158327/69413)

EDIT: According to this answer https://stackoverflow.com/a/34859264/4269699 (from what is essentially a duplicate question) you can also edit the mids object directly by accessing $data and $imp. So for example

 midsobj2<-midsobj1
 midsobj2$data$new.var <- midsobj2$data$chl/2
 midsobj2$imp$new.var <- midsobj2$imp$chl/2

You will run into trouble though if you want to subset $imp or if you want to use $call, so I wouldn't recommend this solution in general.

Community
  • 1
  • 1
wjchulme
  • 1,928
  • 1
  • 18
  • 28
  • 1
    It appears that the bug in `as.mids` may have been corrected in the most recent release of mice (2.25, 2015-11-09). – Paul de Barros Apr 04 '16 at 21:09
  • Did this work for you? Because I have been getting some strange results after using `as.mids`. More here: http://stackoverflow.com/questions/36511909/as-mids-replaces-added-values-with-na – Paul de Barros Apr 11 '16 at 13:10
4

Another option is to calculate the variables before the imputation and place restrictions on them.

library(mice)

# Create the additional variable - this will have missing
nhanes$extra <- nhanes$chl / 2

# Change the method of imputation for extra, so that it always equals chl/2
# Change the predictor matrix so only chl predicts extra
ini <- mice(nhanes, max = 0, print = FALSE)

meth <- ini$meth
meth["extra"] <- "~I(chl / 2)"

pred <- ini$pred  # extra isn't used to predict
pred["extra", "chl"] <- 1

# Imputations
imput <- mice(nhanes, seed = 1, pred = pred, meth = meth, print = FALSE)

There are examples in mice: Multivariate Imputation by Chained Equations in R.

slamballais
  • 3,161
  • 3
  • 18
  • 29
user20650
  • 24,654
  • 5
  • 56
  • 91
1

There is an overload of with that can help you here

with(imput, chl/2)

the documentation is given at ?with.mids

MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • Thanks. But there a way to use this to actually modify each of the imputed datasets, for example by adding the calculated column? `with(imput, function(x) x$imp$new.var=chl/2)` doesn't work, perhaps because the format is wrong. – half-pass Oct 31 '14 at 04:31
  • Why would you need to do that? `with()` will not work with assignment. But what ever you are running with the `with()` can do the transformation. – MrFlick Oct 31 '14 at 04:33
1

There's a function for this in the basecamb package:

library(basecamb)
apply_function_to_imputed_data(mids_object, function)
p-mq
  • 11
  • 2