0

I am doing a simulation study and one of the datasets I am imputing is very small (n=10). When using MICE, my dataset and code are as follows

> dat
            y         X1        X2
11 -155.04185         NA 10.464688
12   69.02116         NA  8.245312
13  -89.18124   21.69072  4.717425
14  115.52205         NA 15.666802
15   94.09654         NA  6.977855
16   65.44607         NA 16.608755
17 -246.09192         NA  3.208590
18  118.99815   25.68459  4.727989
19  214.84858         NA  6.065670
20  293.19425         NA  6.647658


> pred1 <-matrix(data= c(0,0,0,
                         1,0,1, 
                         0,0,0), nrow = 3, ncol = 3, byrow = TRUE)

> mice(dat, m=25, method= "norm", predictorMatrix = pred1, maxit=5)
    iter imp variable
  1   1  X1_missing
Error in cor(xobs[, keep, drop = FALSE], use = "all.obs") : 'x' is empty

For another dataset which has 3 observed values for X1, the mice command worked fine with no errors.

I have looked up the error and came across these two links which have not helped: https://stat.ethz.ch/pipermail/r-help/2015-December/434914.html

Unclear error with mice package

I have looked at the following code in github https://github.com/stefvanbuuren/mice/blob/master/R/internal.R

I have determined that 'x' is the design matrix which is used to impute the variable with missing observations. (found the definitions in this link: https://stat.ethz.ch/pipermail/r-help/2015-December/434914.html)

In my case the design matrix should consist of 'y' and 'X2' which I have specified in pred1 to help impute 'X1'. Given that 'y' and 'X2' are fully observed in the data, I am not sure why it thinks the design matrix is empty.

Would anyone have any ideas as to what is going wrong?

UPDATE: After updating the mice package to version 3.4.0 the imputations ran for the data fold but it logged a number of events and output the following error message

it im        dep meth                                                     out
1  1  1 X1_missing norm         df set to 1. # observed cases: 2  # predictors: 3
2  1  1 X1_missing norm All predictors are constant or have too high correlation.
3  1  2 X1_missing norm         df set to 1. # observed cases: 2  # predictors: 3
4  1  2 X1_missing norm All predictors are constant or have too high correlation.
5  1  3 X1_missing norm         df set to 1. # observed cases: 2  # predictors: 3
6  1  3 X1_missing norm All predictors are constant or have too high correlation.

So the issue is to do with the small number of observations and the number of predictors I am using resulting in negative degrees of freedom. In the following link (https://stefvanbuuren.name/fimd/sec-toomany.html#finding-problems-loggedevents) it states that the degrees of freedom are being set to 1 implying predictors are being dropped.

Therefore, I may need to tweak my simulated data to get around this.

OCarroll
  • 38
  • 5

1 Answers1

0

The help file provided with mice states the following about the "predictorMatrix" parameter

predictorMatrix: A numeric matrix of length(blocks) rows and ncol(data) columns, containing 0/1 data specifying the set of predictors to be used for each target column.

There are 2 issues with your predictor matrix. One issue is that the column names should correspond with the columns names with column names of your data in dat. This can be fixed using colnames:

# This may or may not be the correct order or naming the columnds
colnames(pred1) = c("y", "X1", "X2")

The other issue is that the number of rows of the predictor matrix should be the same as the number of rows of your data object (the number of columns shuold also be equal). In this case dat has 25 rows so your predictor matrix will also have 25 rows.

Here is an example of a predictor matrix that will work for your data. This example is for illustrative purposes only and it is likely not the predictor matrix that you need:

# predictor matrix should have same dimensions as object with data.
# create an example predictor matrix of all 1's that has the correct dimensions
pred.example <- matrix(1, ncol=3, nrow=nrow(dat))

# rename columns of example predictor matrix
colnames(pred.example) = c("y", "X1", "X2")

# Run mice
mice(dat, m=25, method= "norm", pred.example = pred1, maxit=5)

# Partial output
iter imp variable
  1   1
  1   2
  1   3
  1   4
  1   5
  ...
  5   23
  5   24
  5   25
Class: mids
Number of multiple imputations:  25 
Imputation methods:
V1  y X1 X2 
"" "" "" "" 
PredictorMatrix:
   V1 y X1 X2
V1  0 1  0  1
y   1 0  0  1
X1  0 0  0  0
X2  1 1  0  0
Number of logged events:  1 
  it im dep      meth out
1  0  0     collinear  X1

NM_
  • 1,887
  • 3
  • 12
  • 27
  • Hi there, thank you for your answer. I'm not sure I agree with your statement "the predictor matrix will also have 25 rows." The mice help file states that the predictorMatrix is a square matrix of size ncol(data). Each row is the "target variable" and the column variable is taken as the predictor based on the element value of 0 or 1. As a square matrix the number of rows and columns of the predictor matrix should be equal and not (25 x 3) as you have suggested. I do not think the predictor matrix is the issue as it has worked perfectly for other simulated datasets like `dat` above. – OCarroll Apr 04 '19 at 18:13
  • @user110577, I posted the first few sentences about the `predictorMatrix` parameter as it appears verbatim in the help file. This was the information provided by the creators of the package. If you type `?mice` into the R console, you can verify. Sorry, but I'm not sure how to explain the discrepancy. – NM_ Apr 04 '19 at 18:28