0

I am working with a price file that has a number of missing weekend values. I am using the MICE function to impute weekend prices. The mice function doesn't allow non-numeric values and errors out if the date is included. This is the reason I use [,2:33], but I need a date so I can join it back to another file. I have tried converting the date to a number, but reversing that conversion at the end of the process yields NAs. Looking for suggestions to keep the dates in the dataframe.

Snippet Example

The link above has a snippet of the data set.

Code for mice function

Imputed <- mice(Features[,2:33], m=5, maxit = 5, method = 'pmm', seed = 500)

unpacking a large mids

df <- complete(Imputed, action = 1L, include = FALSE)

Community
  • 1
  • 1
  • the close votes are because you haven't made your question [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). I know nothing of your data, but as alternative, could you fill in Friday's price for saturday and sunday? `zoo` package has the function `na.locf()` for this. – Chase Jan 29 '19 at 18:54
  • Chase, thanks. I included a snip of the data. The dataframe has 33 columns with the first column being the date. As I mentioned before, MICE doesn't seem to allow you to use a date in the data frame. As a work around, I created a copy of the date and reformatted it as a numeric. The problem I'm running into is that I cannot reverse the number to a date without it becoming an NA. – Michael Westerman Jan 29 '19 at 19:11

1 Answers1

0

The easiest solution here would be just removing the data before imputation and adding the dates back to the data.frame afterwards.

Since mice does not change the ordering of columns this can be easily done.

As an alternative solution, mice can be also set to only perform imputation on certain columns / only use certain columns for imputation. I think if you exclude the date here, it might also no more throw an error. The parameter is:

predictorMatrix
A numeric matrix of length(blocks) rows and ncol(data) columns, containing 0/1 data specifying the set of predictors to be used for each target column. Each row corresponds to a variable block, i.e., a set of variables to be imputed. A value of 1 means that the column variable is used as a predictor for the target block (in the rows). By default, the predictorMatrix is a square matrix of ncol(data) rows and columns with all 1's, except for the diagonal. Note: For two-level imputation models (which have "2l" in their names) other codes (e.g, 2 or -2) are also allowed.

But probably the first solution with just removing and adding the column back aferwards is easier to perform.

Steffen Moritz
  • 7,277
  • 11
  • 36
  • 55