I have seen several questions + answers for similar posts in SO (ex. 1, ex. 2, ex. 3), but none seem to really address the problem in the context of tidymodels
.
I am trying to use a second-order step_poly
function inside a preprocessing recipe to prepare for a KNN model. The sample data is pulled from a Kaggle Playground competition. The training data itself is ~360,000 x 17 with all numeric predictors.
A light preprocessing reprex is:
rec <- recipe(cost ~ ., data = train) |>
update_role(id, new_role = 'id') |>
step_normalize(all_numeric_predictors())
step_poly(all_predictors()) |> # this line fails??
step_interact(~ all_predictors():all_predictors())
When going to prep the recipe, prep(rec)
, an error is thrown:
Error in poly(degree = 2L, x = c(0.871948016751444, 0.871948016751444, : 'degree' must be less than number of unique points
This also persists at tuning time. I understand the rationale behind why the polynomial degree must be less than the number of unique points, but I do not understand where the "unique points" are coming from. Why does my data only have a single unique point? And how can I fix this?
Any and all help is greatly appreciated!