2

I can't figure out how to use the recipes package to replace missing numeric variables with a constant.

I did think about using step_lowerimpute, but I don't think I will be able to use it for my case. step_lowerimpute replaces missing values below a given threshold with random numbers between 0 and the threshold. In my case that will not work.

For example, I have some lab variable, like lactic acid, which is often missing. I want to replace missing values with an extreme value, such as -9999.

  • When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Show what you tried and describe how it didn't do what you want. – MrFlick Jun 20 '18 at 21:17

3 Answers3

3

This is my first day looking at the recipes package (so perhaps not the most reliable answer ...). I had the same question and believe that the following works as required:

rec <-
    recipe( ~ ., data = airquality) %>%
    step_mutate(
        Ozone = tidyr::replace_na(Ozone, -9999)
    ) %>%
    prep(training = airquality, retain = TRUE)

juice(rec)

Before coming across this method, I also tried creating my own step, which also seems to work, but above is much simpler ...

step_nareplace <- 
    function(recipe, 
            ..., 
            role = NA, 
            trained = FALSE,  
            skip = FALSE,
            columns = NULL,
            replace = -9,
            id = rand_id("nareplace")) {
  add_step(
    recipe,
    step_nareplace_new(
      terms = ellipse_check(...),
      role = role,
      trained = trained,
      skip = skip,
      id = id,
      replace = replace,
      columns = columns
    )
  )
}

step_nareplace_new <- 
    function(terms, role, trained, skip, id, columns, replace) {
  step(
    subclass = "nareplace",
    terms = terms,
    role = role,
    trained = trained,
    skip = skip,
    id = id,
    columns = columns,
    replace = replace
  )
}

prep.step_nareplace <- function(x, training, info = NULL, ...) {

    col_names <- terms_select(x$terms, info = info)

    step_nareplace_new(
        terms = x$terms,
        role = x$role,
        trained = TRUE,
        skip = x$skip,
        id = x$id,
        columns = col_names,
        replace = x$replace
      )
}

bake.step_nareplace <- function(object, new_data, ...) {
  for (i in  object$columns) {
    if (any(is.na(new_data[, i])))
      new_data[is.na(new_data[, i]), i] <- object$replace
  }
  as_tibble(new_data)
}

print.step_nareplace <-
  function(x, width = max(20, options()$width - 30), ...) {
    cat("Replacing NA values in ", sep = "")
    cat(format_selectors(x$terms, wdth = width))
    cat("\n")
    invisible(x)
  }

tidy.step_nareplace <- function(x, ...) {
  res <- simple_terms(x, ...)
  res$id <- x$id
  res
}


recipe(Ozone ~ ., data = airquality) %>%
   step_nareplace(Ozone, replace = -9999) %>%
   prep(airquality, verbose = FALSE, retain = TRUE) %>%
   juice()
user1420372
  • 2,077
  • 3
  • 25
  • 42
1

You can try using the step_unknown() function, which replaces missing values NA values with a new_level that the user can provide.

Matt Dancho
  • 6,840
  • 3
  • 35
  • 26
  • `step_unknown` only works for factors and chars. `step_mutate` and `step_mutate_at` is probably relevant. https://recipes.tidymodels.org/reference/step_mutate_at.html? – madprogramer May 28 '22 at 17:32
0

Why do you specifically need the recipes package to do this? Just replacing all NAs with a constant value can be done quite easy.

library(imputeTS)
na.replace(yourDataframe, fill = -9999)

Other solution (without additional package):

yourDataframe[is.na(yourDataframe)] <- -9999
Steffen Moritz
  • 7,277
  • 11
  • 36
  • 55
  • Thanks @stats0007! The recipes package has the capability to automatically determine the types of all the variables in the input data set. I want to apply this to all the numeric variables at once, and I think the recipes packaged is probably faster than doing it "by hand." I may have to end up doing that if I can't find a way in recipes. – JoAnn Alvarez Jun 25 '18 at 21:32