0

I'm trying to add a "step_woe" step to a recipe, where previously i added a "step_discretize_xgb" but i keep getting an error message because of the variables types i need to transform with the step_woe.

Here's a short example of my code, with only one variable.


library(embed)
library(tidymodels)
library(tidyverse)
library(xgboost)

TG <- sample(c(0,1), 1000, replace = TRUE)

V1 <- rnorm(1000)

train <- tibble(VARIABLE_1 = V1,
                TARGET = TG)

rec <- recipes::recipe(TARGET ~ ., 
                        data = train) %>% 
  step_discretize_xgb(all_numeric_predictors(), 
                      outcome = vars(TARGET)) %>% 
  step_woe(all_of("VARIABLE_1"),
           outcome = vars(TARGET)) %>% 
  prep(training = train)

PS - I've checked that this variable is a factor and it is binned. I tried without the "all_of" and quotes, ie, just VARIABLE_1.

The message is:

Error in check_type(): ! All columns selected for the step should be factor or character Backtrace:

  1. ... %>% prep(training = train)
  2. recipes:::prep.recipe(., training = train)
  3. embed:::prep.step_woe(x$steps[[i]], training = training, info = x$term_info)
  4. recipes::check_type(training[, outcome_name], quant = FALSE)

Error in check_type(training[, outcome_name], quant = FALSE) :

Filipa
  • 50
  • 6
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Nov 21 '22 at 15:45

1 Answers1

1

This is an unfortunate error message from {embed}. You are getting this error because outcome of step_woe() needs to be a categorical variable. Since TG appears to be a categorical variable, you can code it as such and it will work.

I have opened an issue to make this error clearer: https://github.com/tidymodels/embed/issues/147

library(embed)
library(tidymodels)
library(tidyverse)
library(xgboost)


TG <- sample(c("0", "1"), 1000, replace = TRUE)

V1 <- rnorm(1000)

train <- tibble(VARIABLE_1 = V1,
                TARGET = TG)

rec <- recipes::recipe(TARGET ~ ., 
                       data = train) %>% 
  step_discretize_xgb(all_numeric_predictors(), 
                      outcome = vars(TARGET)) %>% 
  step_woe(all_of("VARIABLE_1"),
           outcome = vars(TARGET)) %>% 
  prep(training = train)

rec
#> Recipe
#> 
#> Inputs:
#> 
#>       role #variables
#>    outcome          1
#>  predictor          1
#> 
#> Training data contained 1000 data points and no missing data.
#> 
#> Operations:
#> 
#> Discretizing variables using xgboost VARIABLE_1 [trained]
#> WoE version against outcome TARGET for VARIABLE_1 [trained]

Created on 2022-11-21 with reprex v2.0.2

EmilHvitfeldt
  • 2,555
  • 1
  • 9
  • 12
  • thank you very much! It works for this example. For my actual project, not so much but i'll have to dig a bit further. Your answer gave my a great starting point! – Filipa Nov 22 '22 at 17:37