I am using the mice
package in R
to do multiple imputations of a dataset with a large amount of missingness. There are variables in the raw dataset that are important for the imputation process, and for later analyses. However, I want to create a correlation matrix using cor()
without including some of the variables. Normally, for a simple dataset x
, cor(x[,3:7])
would yield the correlation matrix for columns 3 through 7. If x
is a mids
object created by the mice
function, one would normally use with
to perform a repeated analysis to create a mira
object, and then use pool
to create a mipo
pooled outcomes object. However, the second element of with
is supposed to be a formula that references the columns of the dataset, and that is not the kind of input that goes into cor()
. If x
is a mids
object, cor(x[,3:7])
does not work, and neither does with(x, cor(x[,3:7]))
.
How can I created a pooled correlation matrix for a subset of the variables from a multiple imputation data set?
#reproducible example
x = data.frame(matrix(rnorm(100),10,10)) #create random data
x[9:10,] = NA #add missingness
x.mice = mice(x) #make imputed data set
cor(x.mice[,3:7]) #doesn't work
with(x.mice, cor(x.mice[,3:7])) #doesn't work
with(x.mice[,3:7], cor()) #doesn't work