24

I trained a random forest using caret + ranger.

fit <- train(
    y ~ x1 + x2
    ,data = total_set
    ,method = "ranger"
    ,trControl = trainControl(method="cv", number = 5, allowParallel = TRUE, verbose = TRUE)
    ,tuneGrid = expand.grid(mtry = c(4,5,6))
    ,importance = 'impurity'
)

Now I'd like to see the importance of variables. However, none of these work :

> importance(fit)
Error in UseMethod("importance") : no applicable method for 'importance' applied to an object of class "c('train', 'train.formula')"
> fit$variable.importance
NULL
> fit$importance
NULL

> fit
Random Forest 

217380 samples
    32 predictors

No pre-processing
Resampling: Cross-Validated (5 fold) 
Summary of sample sizes: 173904, 173904, 173904, 173904, 173904 
Resampling results across tuning parameters:

  mtry  RMSE        Rsquared 
  4     0.03640464  0.5378731
  5     0.03645528  0.5366478
  6     0.03651451  0.5352838

RMSE was used to select the optimal model using  the smallest value.
The final value used for the model was mtry = 4. 

Any idea if & how I can get it ?

Thanks.

François M.
  • 4,027
  • 11
  • 30
  • 81

3 Answers3

19

varImp(fit) will get it for you.

To figure that out, I looked at names(fit), which led me to names(fit$modelInfo) - then you'll see varImp as one of the options.

Tchotchke
  • 3,061
  • 3
  • 22
  • 37
  • 10
    Yeah, I found it too in the meantime by diving into `caret`'s doc. Thank you for that useful method to find information, though ! It turns out `varImp()` is the way to get variable importance for most models trained with caret's `train()`. Note to future users though : I'm not 100% certain and don't have the time to check, but it seems it's necessary to have `importance = 'impurity'` (I guess `importance = 'permutation'` would work too) passed as parameter in `train()` to be able to use `varImp()`. – François M. May 17 '16 at 16:17
  • 10
    Another note : it seems that if you train your model with `ranger` but without `caret`, then `importance(fit)` would be the right way to get variable importance. As above, I think the parameter `importance = 'impurity'` (or 'permutation') needs to be in `train()` – François M. May 17 '16 at 16:22
  • 1
    Strange it's not working for me. No importance values available... hmmm – Hack-R Jun 22 '17 at 23:15
  • 1
    this doesn't work for me. The function exists but it returns no importance values available? – KillerSnail Nov 13 '17 at 07:26
  • 10
    Just to be clear, the default for `ranger` is to not compute `importance`. You must explicitly specify `importance = 'impurity'` or `importance = 'permutation'` for any of these methods to work, even if you are using `train`. – John M Apr 16 '18 at 15:51
  • 1
    Since it took a lot of searching for me to find this, note that `varImp` will also work for `Rborist` models trained with `caret`. – cuttlefish Jul 17 '19 at 20:26
13

according to @fmalaussena

set.seed(123)
ctrl <- trainControl(method = 'cv', 
                     number = 10,
                     classProbs = TRUE,
                     savePredictions = TRUE,
                     verboseIter = TRUE)

rfFit <- train(Species ~ ., 
               data = iris, 
               method = "ranger",
               importance = "permutation", #***
               trControl = ctrl,
               verbose = T)

You can pass either "permutation" or "impurity" to argument importance. The description for both value can be found here: https://alexisperrier.com/datascience/2015/08/27/feature-importance-random-forests-gini-accuracy.html

NaNxT
  • 131
  • 2
  • 5
9

For 'ranger' package you could call an importance with

fit$variable.importance

As a side note, you could see the all available outputs for the model using str()

str(fit)