1

I would like to change the threshold of the model and have comes across post like in the Cross Validated thread How to change threshold for classification in R randomForests?

If I change the threshold post creating a model that means I will again have to tweak things for test data or new data.

Is there a way in R & caret to change the threshold within the model so that I can run the same model with same threshold value on new data or test data as well?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
ViSa
  • 1,563
  • 8
  • 30
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Nov 02 '20 at 19:29
  • 1
    If you want to obtain performance on already performed resampling you can use [thresholder](https://rdrr.io/cran/caret/man/thresholder.html). However this will lead to positive bias. In order to evaluate thus found optimal threshold you will need an additional validation set. In my opinion threshold should be treated as any other hyper parameter and tuned jointly with other algorithm hyper parameters using an appropriate metric. Caret is not designed for this although you can construct a custom model: https://topepo.github.io/caret/using-your-own-model-in-train.html#Illustration5 – missuse Nov 02 '20 at 22:24

1 Answers1

1

In probabilistic classifiers, such as Random Forests, there is not any threshold involved during fitting of a model, neither there is any threshold associated with a fitted model; hence, there is actually nothing to change. As correctly pointed out in the CV thread Reduce Classification Probability Threshold:

Choosing a threshold beyond which you classify a new observation as 1 vs. 0 is not part of the statistics any more. It is part of the decision component.

Quoting from my own answer in Change threshold value for Random Forest classifier :

There is simply no threshold during model training; Random Forest is a probabilistic classifier, and it only outputs class probabilities. "Hard" classes (i.e. 0/1), which indeed require a threshold, are neither produced nor used in any stage of the model training - only during prediction, and even then only in the cases we indeed require a hard classification (not always the case). Please see Predict classes or class probabilities? for more details.

So, if you produce predictions from a fitted model, say rf, with the argument type = "prob", as shown in the CV thread you have linked to:

pred <- predict(rf, mydata, type = "prob")

these predictions will be probability values in [0, 1], and not hard classes 0/1. From here, you are free to choose the threshold as shown in the answer there, i.e.:

thresh <- 0.6  # any desired value in [0, 1]
class_pred <- c()
class_pred[pred <= thresh] <- 0
class_pred[pred >  thresh] <- 1

or of course experiment with different values of threshold without needing to change anything in the model itself.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
  • 1
    I think the question was about evaluating different decision thresholds during tuning so one could choose the optimal threshold using resampling and some predefined metric. – missuse Nov 02 '20 at 22:21
  • @missuse could be (the question is not terribly clear). Let's wait for the OPs response and see – desertnaut Nov 02 '20 at 22:28
  • Yes as @missuse has correctly pointed out that I was looking to set `threshold` during `tuning` or training the model only. As the `predict function` is already taking a decision based on 0.5 threshold so if there is any tuning parameter to set the threshold value then future exercise of setting threshold value for train or any new data will not be needed. – ViSa Nov 03 '20 at 06:10
  • @desertnaut, you have correctly pointed that selecting `threshold` is not part of `statistics` but decision making. But I was just concerned if a function `predict` is taking a decision based on 0.5 and giving me classes then there might be an option to set it as well. And thanks for sharing links on hard & soft margin, it was very helpful in understanding. – ViSa Nov 03 '20 at 06:20