0

I am doing missing value imputation on a series of ordinal variables.

I first read in data frame and do some cleaning:

dietgp1m<-read.csv(file='1 Month data-diet.csv',header=TRUE,na.strings=c(""," ","NA","."))
for (i in 1:ncol(dietgp1m)) {dietgp1m[,i]<-as.factor(dietgp1m[,i])}
dietgp1m<-dietgp1m[!is.na(dietgp1m$Patient.Trial.ID),]
dietgp1m["count"]<-0
for (i in 1:nrow(dietgp1m)) {dietgp1m$count[i]<-0; for (j in 9:298) {if (!is.na(dietgp1m[i,j])) {dietgp1m$count[i]<-dietgp1m$count[i]+1}}}
dietgp1m<-dietgp1m[dietgp1m$count!=0,]

Then I create a function for missing value imputation, subset the dataset and run the function:

# Imputation
imputation<-function(A){
  B<-mice(data = A, m = 5, method = "polr", maxit = 50, seed = 500)
  C<-complete(B, 'long', include=TRUE) #include=TRUE if include the original dataset with missing values
print(colnames(C))
###pool imputed data
for (i in 4:ncol(C)) {C[,i]<-as.numeric(as.character(C[,i]))}
for (j in 4:ncol(C)) {for (i in 1:159) {if (is.na(C[i,j])) {C[i,j]<-round((C[i+159,j]+C[i+159*2,j]+C[i+159*3,j]+C[i+159*4,j]+C[i+159*5,j])/5)}}}
print(nrow(C)); print(ncol(C))
}

# Quality of life
# Diet group 1 month
seb<-subset(df3, select=c(Patient.Trial.ID, Q32a:Q32j))
missinganalysis(seb)
imputation(seb)

Then I get an error message:

 iter imp variable
  1   1  Q32a
Error in apply(draws, 2, sum) : dim(X) must have a positive length
Called from: apply(draws, 2, sum)

Please help! Thank you!

1 Answers1

0

I also received this error message a few times. After some code experimentation, I found out the reason why I got such a message: A rare/weird combination of (a) very small number of missing cases in a variable (missing value for one case only in that variable) and (b) assigning 'wrong' imputation method for that variable (e.g. using a polr method to impute a binary variable). Once I fixed this, setting the imputation method to 'logreg' for that binary variable, I stopped getting the error message.

Not sure this is your case, though. I would recommend some data screening to check the number of missing values for each case and assigning the 'correct' imputation method to each variable (in case you are not using pmm -- pmm works well for many different types of variables, see Van Buuren's comments here: https://statisticalhorizons.com/predictive-mean-matching).

For instance, if you have V1 (binary), V2 (ordered), V3 (continuous), V4 (multinom), and V5 (ordered), you can set method as:

method=c('logreg', 'polr', 'pmm', 'polyreg', 'polr')

Hope this helps.

FabF
  • 61
  • 5
  • I found the same thing when I had this error (it was for a column with only 1 missing variable). Just had to delete that one case and the imputations all ran smoothly otherwise. – Brandon Feb 14 '20 at 17:24