Several publications highlight that there may be biases in variable importance scores derived from machine learning models. A recent study shows by Loh and Zou (2021) shows that ranger
permutation-based variable importance scores produce unbiased results.
I am using tidymodels
with a ranger
engine to estimate random forest model. How can I get ranger
variable importance scores from the resulting fit? What is the difference between the variable importance scores from vip
? From my understanding, the vip in the example below is the random forest model-specific gini importance.
library(tidymodels)
library(vip)
aq <- na.omit(airquality)
model_rf <-
rand_forest(mode = "regression") %>%
set_engine("ranger", importance = "permutation") %>%
fit(Ozone ~ ., data = aq)
# variable importance
vip:::vi(model_rf)