2

Could somebody show me how to generate permutation-based variable implots within the tidy modelling framework? Currently, I have this:

library(tidymodels)

# variable importance
final_fit_train %>%
  pull_workflow_fit() %>%
  vip(geom = "point",
      aesthetics = list(color = cbPalette[4],
                        fill = cbPalette[4])) +
  THEME +
  ggtitle("Elastic Net")

which generates this:

enter image description here

However, I would like to have something like this

enter image description here

It's not clear to me how the rather new tidy modelling framework integrates with the current VIP package. Anybody that could help. Thanks!

https://koalaverse.github.io/vip/articles/vip.html (API of the VIP package).

Lstat
  • 1,450
  • 1
  • 12
  • 18

1 Answers1

3

To compute variable importance using permutation, you need just a few more pieces to put together, compared to using model-dependent variable importance.

Let's look at an example for an SVM model, which does not have model-dependent variable importance score.

library(tidymodels)
#> ── Attaching packages ──────────────────────── tidymodels 0.1.1 ──
#> ✓ broom     0.7.0      ✓ recipes   0.1.13
#> ✓ dials     0.0.8      ✓ rsample   0.0.7 
#> ✓ dplyr     1.0.0      ✓ tibble    3.0.3 
#> ✓ ggplot2   3.3.2      ✓ tidyr     1.1.0 
#> ✓ infer     0.5.3      ✓ tune      0.1.1 
#> ✓ modeldata 0.0.2      ✓ workflows 0.1.2 
#> ✓ parsnip   0.1.2      ✓ yardstick 0.0.7 
#> ✓ purrr     0.3.4
#> ── Conflicts ─────────────────────────── tidymodels_conflicts() ──
#> x purrr::discard() masks scales::discard()
#> x dplyr::filter()  masks stats::filter()
#> x dplyr::lag()     masks stats::lag()
#> x recipes::step()  masks stats::step()

data("hpc_data")

svm_spec <- svm_poly(degree = 1, cost = 1/4) %>%
  set_engine("kernlab") %>%
  set_mode("regression")

svm_fit <- workflow() %>%
  add_model(svm_spec) %>%
  add_formula(compounds ~ .) %>%
  fit(hpc_data)

svm_fit
#> ══ Workflow [trained] ════════════════════════════════════════════
#> Preprocessor: Formula
#> Model: svm_poly()
#> 
#> ── Preprocessor ──────────────────────────────────────────────────
#> compounds ~ .
#> 
#> ── Model ─────────────────────────────────────────────────────────
#> Support Vector Machine object of class "ksvm" 
#> 
#> SV type: eps-svr  (regression) 
#>  parameter : epsilon = 0.1  cost C = 0.25 
#> 
#> Polynomial kernel function. 
#>  Hyperparameters : degree =  1  scale =  1  offset =  1 
#> 
#> Number of Support Vectors : 2827 
#> 
#> Objective Function Value : -284.7255 
#> Training error : 0.835421

Our model is now trained, so it's ready for computing variable importance. Notice a couple of steps:

  • You pull() the fitted model object out of the workflow.
  • You have to specify the target/outcome variable, compounds.
  • In this case, we need to pass both the original training data (use training data here, not testing data) and the right underlying function for predicting (this might be tricky to figure out in some cases but for most packages will just be predict()).
library(vip)
#> 
#> Attaching package: 'vip'
#> The following object is masked from 'package:utils':
#> 
#>     vi
svm_fit %>%
  pull_workflow_fit() %>%
  vip(method = "permute", 
      target = "compounds", metric = "rsquared",
      pred_wrapper = kernlab::predict, train = hpc_data)

Created on 2020-07-17 by the reprex package (v0.3.0)

You can increase nsim here to do this more than once.

Julia Silge
  • 10,848
  • 2
  • 40
  • 48
  • 1
    Hi this might work for kernlab (svm), but does not seem to work for a glmnet object. I get the following error: _Error in predict.glmnet(object, newdata = train_x) : You need to supply a value for 'newx'_ – Pieter-Jan Inghelbrecht Jul 31 '20 at 15:06
  • @Pieter-JanInghelbrecht For glmnet I would recommend using model-specific variable importance [as demonstrated here](https://stackoverflow.com/questions/61606337/computing-importance-measure-using-vip-package-on-a-parsnip-model/61811440#61811440). – Julia Silge Aug 01 '20 at 16:13
  • @JuliaSilge can you refer me to a source, or please explain what the "importance" number actually means? Are these the coefficient estimate values per variable? – CanyonView Mar 09 '21 at 22:58
  • @Shai That depends on the specifics of the model. Check out [this answer](https://stats.stackexchange.com/questions/332960/what-is-variable-importance) for perspective on what variable importance is, and the [vignette for the **vip** package](https://koalaverse.github.io/vip/articles/vip.html) for some examples – Julia Silge Mar 11 '21 at 04:17