I can't figure out why I'm having two different PDPs depending on whether I'm using pdp-package or DALEX-package.
I initially wanted to make PDPs using tidymodels in combination with pdp and DALEX, but neither combination seemed to work easily. Then I went on to use caret, but I still get different plots when using pdp and DALEX, respectively. Just for context and to make the problem more in-touch with my actual problem, I'm tuning a logistic regression model.
Below is a reprex using caret:
# Loading packages
suppressMessages({
library(tidymodels)
library(caret)
library(DALEX)
library(DALEXtra)
library(pdp)
library(patchwork)
library(mlbench)
})
# Loading data and assigning to shorter variable name
data(PimaIndiansDiabetes)
PID <- PimaIndiansDiabetes
rm(PimaIndiansDiabetes)
# Splitting data
split <- initial_split(PID)
train <- training(split)
test <- testing(split)
# Create random recipe for data using interactions for the fun of it.
PID_rec <-
recipe(diabetes ~ ., data = train) %>%
step_interact(terms = ~ all_predictors():all_predictors(),
sep = ":") %>%
prep()
# Use recipe on training and testing data
train <- PID_rec %>% bake(new_data = NULL)
test <- PID_rec %>% bake(new_data = test)
#### Train Logistic Classification Model using CARET ####
# Creating random grid
grid <- expand.grid(alpha = seq(0, 1, length.out = 4),
lambda = seq(0, 30, length.out = 5))
# Fitting model
caret_fit_cv <- train(diabetes ~ .,
data = train,
tuneGrid = grid,
trControl = trainControl(method = 'cv', number = 5),
family = 'binomial',
method = 'glmnet',
metric = "Accuracy")
# caret_fit_cv$bestTune: alpha = 0, lambda = 0
# Creating PDP via pdp-package
pdp_plot_caret <- pdp::partial(object = caret_fit_cv,
pred.var = "glucose",
plot = TRUE,
plot.engine = "ggplot",
type = "classification",
train = train)
# Creating PDP via DALEX-package
explainer <- explain(model = caret_fit_cv,
data = train[, -9],
y = train$diabetes,
label = "Logistic Model",
verbose = FALSE)
#> Warning in Ops.factor(y, predict_function(model, data)): '-' not meaningful for
#> factors
dalex_plot_caret <- model_profile(explainer = explainer,
variables = "glucose",
type = "partial") %>% plot()
# Creating collected plot using patchwork-package
(pdp_plot_caret) /
(dalex_plot_caret)
Created on 2023-02-21 with reprex v2.0.2
Upper plot = pdp-package. Lower plot = DALEX-package.
The DALEX explainer() complains that it is handed a factor as argument to y, but the plot doesn't change even if I change diabetes from a factor to numeric.
Any ideas?