How to add weighted loss to Scikit-learn classifiers?

Question

In many ML applications a weighted loss may be desirable since some types of incorrect predictions might be worse outcomes than other errors. E.g. in medical binary classification (healthy/ill) a false negative, where the patient doesn't get further examinations is a worse outcome than a false positive, where a follow-up examination will reveal the error.

So if I define a weighted loss function like this:

def weighted_loss(prediction, target):
    if prediction == target:
        return 0  # correct, no loss
    elif prediction == 0:  # class 0 is healthy
        return 100  # false negative, very bad
    else:
        return 1  # false positive, incorrect

How can I pass something equivalent to this to scikit-learn classifiers like Random Forests or SVM classifiers?

I am not sure. To me class weight would mean that not only loss but also reward (getting that class right) would be boosted, right? Is there a more in-depth explanation what class_weight does? I couldn't find one. — jaaq, Mar 25 '21 at 12:20
class_weight is for unbalanced dataset where you have different number of samples in each class; in order not to train a model that biased toward class with larger number of samples the class_weight comes in handy. by assigning different weights for each class based on the number of classes you have, the models weights in the case of deep neural network didn't change that much if the current sample used in the training and vise-versa for the class with small number of samples. — Thulfiqar, Mar 25 '21 at 12:26
Well I don't have an unbalanced dataset, I want to artificially imbalance the loss, as a FP is more desirable than a FN. What I get from your comment is that class_weights isn't the answer to my problem, right? — jaaq, Mar 25 '21 at 12:36
yes, class_weights isn't the answer to your problem. however, what you can do is developing a model and then use sklearn.metrics.classification_report to see the results. what you need is high precision score and relatively high recall score. — Thulfiqar, Mar 25 '21 at 12:41
That's basically what I've done as a workaround, I've added another hyperparameter that's basically the threshold of the binary classification, so I basically call `model.predict_proba(X) < threshold` and optimize that as well, which improves my results as it sets a higher required certainty for predicting 'healthy', but I am pretty sure that I'd get better results if the decision boundaries drawn by the RBFs took that into account, when fitting to the data. I just wanted to ask before I forked the sklearn code. — jaaq, Mar 25 '21 at 12:45

desertnaut · Accepted Answer · 2021-03-28T11:41:26.397

I am afraid your question is ill-posed, stemming from a fundamental confusion between the different notions of loss and metric.

Loss functions do not work with prediction == target-type conditions - this is what metrics (like accuracy, precision, recall etc) do - which, however, play no role during loss optimization (i.e. training), and serve only for performance assessment. Loss does not work with hard class predictions; it only works with the probabilistic outputs of the classifier, where such equality conditions never apply.

An additional layer of "insulation" between loss and metrics is the choice of a threshold, which is necessary for converting the probabilistic outputs of a classifier (only thing that matters during training) to "hard" class predictions (only thing that matters for the business problem under consideration). And again, this threshold plays absolutely no role during model training (where the only relevant quantity is the loss, which knows nothing about thresholds and hard class predictions); as nicely put in the Cross Validated thread Reduce Classification Probability Threshold:

the statistical component of your exercise ends when you output a probability for each class of your new sample. Choosing a threshold beyond which you classify a new observation as 1 vs. 0 is not part of the statistics any more. It is part of the decision component.

Although you can certainly try to optimize this (decision) threshold with extra procedures outside of the narrowly-defined model training (i.e. loss minimization), as you briefly describe in the comments, your expectation that

I am pretty sure that I'd get better results if the decision boundaries drawn by the RBFs took that into account, when fitting to the data

with something similar to your weight_loss function is futile.

So, no function similar to your weight_loss shown here (essentially a metric, and not a loss function, despite its name), that employs equality conditions like prediction == target, can be used for model training.

The discusion in the following SO threads might also be useful in clarifying the issue:

Loss & accuracy - Are these reasonable learning curves?
What is the difference between loss function and metric in Keras? (despite the title, the definitions are generally applicable and not only for Keras)
Cost function training target versus accuracy desired goal
How to interpret loss and accuracy for a machine learning model

How to add weighted loss to Scikit-learn classifiers?

1 Answers1