1

What's a good way to cut() a quantiative variable into levels, including a final level dedicated to NAs?

I'd prefer something like the .missing parameter that tidyverse functions commonly offer (e.g., dplyr::recode() & dplyr::if_else()).

If the input is w and this hypothetical function is named cut_with_nas, then the following code

w <- c(0L, NA_integer_, 22:25, NA_integer_, 40)
cut_with_nas(w, breaks=2)

would produce this desired output:

[1] (-0.04,20] Unknown    (20,40]    (20,40]    (20,40]    (20,40]    Unknown    (20,40]   
Levels: (-0.04,20] (20,40] Unknown

I'm posting a function below that accomplishes this, but I was hoping there's a more concise solution, or at least a tested function already existing in a package.

wibeasley
  • 5,000
  • 3
  • 34
  • 62

1 Answers1

2
cut_with_nas   <- function( x, breaks, labels=NULL, .missing="Unknown" ) {
  y <- cut(x, breaks, labels) #, include.lowest = T, right=F)
  y <- addNA(y)
  levels(y)[is.na(levels(y))] <- .missing
  return( y )
}

The majority of this function steals heavily from a response by @akrun three years ago.
(And a little from this unanswered question too.)

wibeasley
  • 5,000
  • 3
  • 34
  • 62
  • after four months, there hasn't been another proposal. I'll mark this my own response as an answer, but would be happy to switch the checkmark for a better answer. – wibeasley Oct 16 '18 at 20:56