I am trying to create pairways interactions between each field of a dataset for a glmnet
model, without having to name each field individually. However, when it tries to perform this automatically, it gets hung up on creating them for all the variants of one-hot encoded categorical variables against themselves (eg it creates an interaction column between Gender_Male
and Gender_Female
, and then can't find any values so the entire thing is filled with NaN
s) which then makes glmnet
throw an error.
Here's some sample code:
library(dplyr)
library(tidyr)
library(rsample)
library(recipes)
library(glmnet)
head(credit_data)
t <- credit_data %>%
mutate(Status = as.character(Status)) %>%
mutate(Status = if_else(Status == "good", 1, 0)) %>%
drop_na()
set.seed(1234)
partitions <- initial_split(t, prop = 9/10, strata = "Status")
parsed_recipe <- recipe(Status ~ ., data = t) %>%
step_dummy(one_hot = TRUE, all_predictors(), -all_numeric()) %>%
step_interact(~.:.) %>% #My attempt to apply the interaction
step_scale(all_predictors()) %>%
prep(training = training(partitions))
train_data <- bake(parsed_recipe, new_data = training(partitions))
test_data <- bake(parsed_recipe, new_data = testing(partitions))
fit <- train_data %>%
select(-Status) %>%
as.matrix() %>%
glmnet(x = ., y = train_data$Status, family = "binomial", alpha = 0)
When I run the glmnet
section at the end, it gives me this error:
Error in lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, :
NA/NaN/Inf in foreign function call (arg 5)
Having looked at this question, I realised there must be NA
s / NaN
s in the data, so I ran summary(train_data)
, which came out looking like this:
So, it's unsurprising that glmnet
is upset, but I'm also not sure how to fix it. I really don't want to manually define every single pairing myself. Is there a recipes
command to remove potential predictor columns containing NaN
s, maybe?