1

I am having difficulty trying to apply what_if analysis to an xgboost model. I am able to run the what_if analysis for a randomForest model however it breaks when I try to run it for the xgboost model.

My question is, given the titanic dataset how can I make the what_if plot? I have added comments to the code to show when the code is breaking for me.

I know I am doing somethign incorrect with the new_xgb_observation part but the what_if (as far as I know) expects a single observation so I am trying to extract from the dtest matrix a single observation.

This is the part of the code which is breaking for me:

#### #### #### #### #### #### ####
# new observation -  which breaks
new_xgb_observation <- dtest[1, ]

# ceteris paribus - what_if analysis which breaks
what_if(xgb_explain, observation = new_xgb_observation,
        selected_variables = c("gender", "age", "fare", "sibsp"))
#### #### #### #### #### #### ####

I then show a working randomForest model below it.

Data:

library(DALEX)
library(ceterisParibus)
library(xgboost)

data("titanic")
data <- titanic

# some quick data cleaning
data <- data %>% 
  select(-c(class, embarked, country)) %>% 
  mutate(
    gender = as.numeric(gender) - 1,
    survived = as.numeric(survived) -1
  )

# split into training and testing data
smp_size <- floor(0.75 * nrow(data))
train_ind <- sample(seq_len(nrow(data)), size = smp_size)

train <- data[train_ind, ]
test <- data[-train_ind, ]

X_train <- train %>% 
  select(-c(survived)) %>% 
  as.matrix()

Y_train <- train %>% 
  select(c(survived)) %>% 
  as.matrix()

X_test <- test %>% 
  select(-c(survived)) %>% 
  as.matrix()

Y_test <- test %>% 
  select(c(survived)) %>% 
  as.matrix()

# train and test as xgb.DMatrix for the XGBoost model
dtrain <- xgb.DMatrix(data = X_train, label = Y_train)
dtest <- xgb.DMatrix(data = X_test, label = Y_test)

# XGBoost parameters
params <- list(
  "eta" = 0.2,
  "max_depth" = 6,
  "objective"="binary:logistic",
  "eval_metric"= "auc",
  "set.seed" = 176
)

# run the XGBoost model
watchlist <- list("train" = dtrain)
nround = 40
xgb.model <- xgb.train(params, dtrain, nround, watchlist)

# DALEX model explanation
xgb_explain <- explain(xgb.model, data = X_train, label = Y_train)

#### #### #### #### #### #### ####
# new observation -  which breaks
new_xgb_observation <- dtest[1, ]

# ceteris paribus - what_if analysis which breaks
what_if(xgb_explain, observation = new_xgb_observation,
        selected_variables = c("gender", "age", "fare", "sibsp"))
#### #### #### #### #### #### ####

################## random Forest model #################

Random_Forest_Model <- randomForest::randomForest(factor(survived) ~., data = train, na.action = na.omit, ntree = 50, importance = TRUE)

# same as for the XGBoost model but this time remove na values
train_rf <- na.omit(train)
X_train_rf <- train_rf %>% 
  select(-c(survived))
Y_train_rf <- train_rf %>% 
  select(c(survived))

test_rf <- na.omit(test)
X_test_rf <- test_rf %>% 
  select(-c(survived))
Y_test_rf <- test_rf %>% 
  select(c(survived))

# DALEX model explanation
rf_explain <- explain(Random_Forest_Model, 
                      data = X_train_rf,
                      y = Y_train_rf)

# This time this works.
new_obs <- X_test_rf[1, ]

# So does this
wi_rf_model <- what_if(rf_explain, observation = new_obs,
                       selected_variables = c("gender", "age", "fare", "sibsp"))

# And this is what I ultimately want.
plot(wi_rf_model, split = "variables", color = "variables", quantiles = FALSE)
user113156
  • 6,761
  • 5
  • 35
  • 81
  • I suggest that you move into parnsnip, exactly for problems like this, parsnip will return the same kind of object on predictions and expect the same kind of input – Bruno May 26 '20 at 17:42
  • Do we lose functionality in the `parsnip` package? `XGBoost` has a lot of parameters which is nice to have "control" over. If I recall correctly, when I looked at the `caret` package for `xgboost` not all of the `xgboost` parameters were tunable in the `caret` version. – user113156 May 26 '20 at 17:53
  • you get all the parameters when specifying the engine, it is just an interface – Bruno May 27 '20 at 01:02

0 Answers0