0

I can't figure out why I'm having two different PDPs depending on whether I'm using pdp-package or DALEX-package.

I initially wanted to make PDPs using tidymodels in combination with pdp and DALEX, but neither combination seemed to work easily. Then I went on to use caret, but I still get different plots when using pdp and DALEX, respectively. Just for context and to make the problem more in-touch with my actual problem, I'm tuning a logistic regression model.

Below is a reprex using caret:

# Loading packages
suppressMessages({
  library(tidymodels)
  library(caret)
  library(DALEX)
  library(DALEXtra)
  library(pdp)
  library(patchwork)
  library(mlbench)
})

# Loading data and assigning to shorter variable name
data(PimaIndiansDiabetes)
PID <- PimaIndiansDiabetes
rm(PimaIndiansDiabetes)

# Splitting data
split <- initial_split(PID)
train <- training(split)
test <- testing(split)

# Create random recipe for data using interactions for the fun of it.
PID_rec <-
  recipe(diabetes ~ ., data = train) %>% 
  step_interact(terms = ~ all_predictors():all_predictors(),
                sep = ":") %>%
  prep()

# Use recipe on training and testing data
train <- PID_rec %>% bake(new_data = NULL)
test <- PID_rec %>% bake(new_data = test)

#### Train Logistic Classification Model using CARET ####
# Creating random grid
grid <- expand.grid(alpha = seq(0, 1, length.out = 4),
                    lambda = seq(0, 30, length.out = 5))

# Fitting model
caret_fit_cv <- train(diabetes ~ .,
                      data = train,
                      tuneGrid = grid,
                      trControl = trainControl(method = 'cv', number = 5),
                      family = 'binomial',
                      method = 'glmnet',
                      metric = "Accuracy")

# caret_fit_cv$bestTune: alpha = 0, lambda = 0

# Creating PDP via pdp-package
pdp_plot_caret <- pdp::partial(object = caret_fit_cv,
                               pred.var = "glucose",
                               plot = TRUE,
                               plot.engine = "ggplot",
                               type = "classification",
                               train = train)

# Creating PDP via DALEX-package
explainer <- explain(model = caret_fit_cv,
                     data = train[, -9],
                     y = train$diabetes,
                     label = "Logistic Model",
                     verbose = FALSE)
#> Warning in Ops.factor(y, predict_function(model, data)): '-' not meaningful for
#> factors

dalex_plot_caret <- model_profile(explainer = explainer,
                                  variables = "glucose",
                                  type = "partial") %>% plot()

# Creating collected plot using patchwork-package
(pdp_plot_caret) /
  (dalex_plot_caret)

Created on 2023-02-21 with reprex v2.0.2

Upper plot = pdp-package. Lower plot = DALEX-package.

The DALEX explainer() complains that it is handed a factor as argument to y, but the plot doesn't change even if I change diabetes from a factor to numeric.

Any ideas?

PeRiKo
  • 1
  • 2

0 Answers0