0

There is an ongoing discussion about the reliable methods of rounding imputed binary variables. Still, the so-called Adaptive Rounding Procedure developed by Bernaards and colleagues (2007) is currently the most widely accepted solution.

Adoptive Rounding Procedure involves normal approximation to a binomial distribution. That is, the imputed values in a binary variable are assigned the values of either 0 or 1, based on the threshold derived by the below formula, where x is the mean of the imputed binary variable:

threshold <- mean(x) - qnorm(mean(x))*sqrt(mean(x)*(1-mean(x)))

To the best of my knowledge, major R packages on imputation (such as Amelia or mice) have yet to include functions that help with the rounding of binary variables. This shortcoming makes it difficult especially for researchers who intend to use the imputed values in logistic regression analysis, given that their dependent variable is coded in binary.

Therefore, it makes sense to write an R function for the Bernaards formula above:

bernaards <- function(x)
{
mean(x) - qnorm(mean(x))*sqrt(mean(x)*(1-mean(x)))
}

With this formula, it is much easier to calculate the threshold for an imputed binary variable with a mean of, say, .623:

bernaards(.623)
[1] 0.4711302

After calculating the threshold, the usual next step is to round the imputed values in variable x.

My question is: how can the above function be extended to include that task as well?

In other words, one can do all of the above in R with three lines of code:

threshold <- mean(x) - qnorm(mean(x))*sqrt(mean(x)*(1-mean(x)))
df$x[x > threshold] <- 1
df$x[x < threshold] <- 0

It would be best if the function included the above recoding/rounding, as repeating the same process for each binary variable would be time-consuming, especially when working with large data sets. With such a function, one could simply run an extra line of code (as below) after imputation, and continue with the analyses:

bernaards(dummy1, dummy2, dummy3)
neutral
  • 107
  • 4
  • 13
  • Not sure I am understanding your problem. Would `bernaards <- function(x) { as.numeric(x > mean(x) - qnorm(mean(x))*sqrt(mean(x)*(1-mean(x))))}` not work? – Barker Dec 13 '16 at 00:04
  • You mean to round up all values above the threshold, as in using `as.integer` ? What about those below the threshold? I am not very good with functions, but what I am basically looking for is something like: `x = 1, if x > mean(x) - qnorm(mean(x))*sqrt(mean(x)*(1-mean(x))))}; x = 0, if x < mean(x) - qnorm(mean(x))*sqrt(mean(x)*(1-mean(x))))}` – neutral Dec 13 '16 at 00:31
  • Try running the function I pasted and see if it does what you want. Also you don't specify the behavior if `x == mean(x) - qnorm(mean(x))*sqrt(mean(x)*(1-mean(x))))`. – Barker Dec 13 '16 at 00:36
  • Yes, your code returns 0s and 1s only, but it does not recode the new values into a new variable. Can you please post an answer with that as well? – neutral Dec 13 '16 at 00:43
  • 2
    Just assign the output to a variable (ie. `myVar <- bernaards(x)`). If this isn't what you want please go to [this link](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) and use the docs there to make a reproducible example of your desired input and output and add that to your question. – Barker Dec 13 '16 at 00:49
  • Why not impute the binary variable utilizing a binomial distribution instead of using the normal approximation to the binomial? – alexwhitworth Dec 16 '16 at 00:36

0 Answers0