39

I tried to implement a weighted binary crossentropy with Keras, but I am not sure if the code is correct. The training output seems to be a bit confusing. After a few epochs I just get an accuracy of ~0.15. I think thats much too less (even for a random guess).

There are in general about 11% ones in the output and 89% zeros, therefore the weights are w_zero=0.89 and w_one=0.11.

My code:

def create_weighted_binary_crossentropy(zero_weight, one_weight):

    def weighted_binary_crossentropy(y_true, y_pred):

        # Original binary crossentropy (see losses.py):
        # K.mean(K.binary_crossentropy(y_true, y_pred), axis=-1)

        # Calculate the binary crossentropy
        b_ce = K.binary_crossentropy(y_true, y_pred)

        # Apply the weights
        weight_vector = y_true * one_weight + (1. - y_true) * zero_weight
        weighted_b_ce = weight_vector * b_ce

        # Return the mean error
        return K.mean(weighted_b_ce)

    return weighted_binary_crossentropy

Maybe someone sees whats wrong?

Thank you

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Kevin Meier
  • 2,339
  • 3
  • 25
  • 52

7 Answers7

24

Normally, the minority class will have a higher class weight. It'll be better to use one_weight=0.89, zero_weight=0.11 (btw, you can use class_weight={0: 0.11, 1: 0.89}, as suggested in the comment).

Under class imbalance, your model is seeing much more zeros than ones. It will also learn to predict more zeros than ones because the training loss can be minimized by doing so. That's also why you're seeing an accuracy close to the proportion 0.11. If you take an average over model predictions, it should be very close to zero.

The purpose of using class weights is to change the loss function so that the training loss cannot be minimized by the "easy solution" (i.e., predicting zeros), and that's why it'll be better to use a higher weight for ones.

Note that the best weights are not necessarily 0.89 and 0.11. Sometimes you might have to try something like taking logarithms or square roots (or any weights satisfying one_weight > zero_weight) to make it work.

Yu-Yang
  • 14,539
  • 2
  • 55
  • 62
23

You can use the sklearn module to automatically calculate the weights for each class like this:

# Import
import numpy as np
from sklearn.utils import class_weight

# Example model
model = Sequential()
model.add(Dense(32, activation='relu', input_dim=100))
model.add(Dense(1, activation='sigmoid'))

# Use binary crossentropy loss
model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

# Calculate the weights for each class so that we can balance the data
weights = class_weight.compute_class_weight('balanced',
                                            np.unique(y_train),
                                            y_train)

# Add the class weights to the training                                         
model.fit(x_train, y_train, epochs=10, batch_size=32, class_weight=weights)

Note that the output of the class_weight.compute_class_weight() is an numpy array like this: [2.57569845 0.68250928].

tsveti_iko
  • 6,834
  • 3
  • 47
  • 39
  • This is a good alternative. But how does this take care of the validation set? Sklearn is only doing it on the basis of training data. – Sarvagya Gupta Feb 17 '20 at 09:17
4

Using class_weights in model.fit is slightly different: it actually updates samples rather than calculating weighted loss.

I also found that class_weights, as well as sample_weights, are ignored in TF 2.0.0 when x is sent into model.fit as TFDataset, or generator. It's fixed though in TF 2.1.0+ I believe.

Here is my weighted binary cross entropy function for multi-hot encoded labels.

import tensorflow as tf
import tensorflow.keras.backend as K
import numpy as np
# weighted loss functions


def weighted_binary_cross_entropy(weights: dict, from_logits: bool = False):
    '''
    Return a function for calculating weighted binary cross entropy
    It should be used for multi-hot encoded labels

    # Example
    y_true = tf.convert_to_tensor([1, 0, 0, 0, 0, 0], dtype=tf.int64)
    y_pred = tf.convert_to_tensor([0.6, 0.1, 0.1, 0.9, 0.1, 0.], dtype=tf.float32)
    weights = {
        0: 1.,
        1: 2.
    }
    # with weights
    loss_fn = get_loss_for_multilabels(weights=weights, from_logits=False)
    loss = loss_fn(y_true, y_pred)
    print(loss)
    # tf.Tensor(0.6067193, shape=(), dtype=float32)

    # without weights
    loss_fn = get_loss_for_multilabels()
    loss = loss_fn(y_true, y_pred)
    print(loss)
    # tf.Tensor(0.52158177, shape=(), dtype=float32)

    # Another example
    y_true = tf.convert_to_tensor([[0., 1.], [0., 0.]], dtype=tf.float32)
    y_pred = tf.convert_to_tensor([[0.6, 0.4], [0.4, 0.6]], dtype=tf.float32)
    weights = {
        0: 1.,
        1: 2.
    }
    # with weights
    loss_fn = get_loss_for_multilabels(weights=weights, from_logits=False)
    loss = loss_fn(y_true, y_pred)
    print(loss)
    # tf.Tensor(1.0439969, shape=(), dtype=float32)

    # without weights
    loss_fn = get_loss_for_multilabels()
    loss = loss_fn(y_true, y_pred)
    print(loss)
    # tf.Tensor(0.81492424, shape=(), dtype=float32)

    @param weights A dict setting weights for 0 and 1 label. e.g.
        {
            0: 1.
            1: 8.
        }
        For this case, we want to emphasise those true (1) label, 
        because we have many false (0) label. e.g. 
            [
                [0 1 0 0 0 0 0 0 0 1]
                [0 0 0 0 1 0 0 0 0 0]
                [0 0 0 0 1 0 0 0 0 0]
            ]

        

    @param from_logits If False, we apply sigmoid to each logit
    @return A function to calcualte (weighted) binary cross entropy
    '''
    assert 0 in weights
    assert 1 in weights

    def weighted_cross_entropy_fn(y_true, y_pred):
        tf_y_true = tf.cast(y_true, dtype=y_pred.dtype)
        tf_y_pred = tf.cast(y_pred, dtype=y_pred.dtype)

        weights_v = tf.where(tf.equal(tf_y_true, 1), weights[1], weights[0])
        weights_v = tf.cast(weights_v, dtype=y_pred.dtype)
        ce = K.binary_crossentropy(tf_y_true, tf_y_pred, from_logits=from_logits)
        loss = K.mean(tf.multiply(ce, weights_v))
        return loss

    return weighted_cross_entropy_fn
Anugraha Sinha
  • 621
  • 6
  • 13
menrfa
  • 1,527
  • 12
  • 13
  • Among all the answers, this is most comprehensive one. I have one suggestion, for `weights_v` we should use following `weights_v = tf.cast(tf.where(tf.equal(tf_y_true, 1), weights[1], weights[0]), dtype=y_pred.dtype)` in order to avoid MatMul data type miss match issues: (Like -> `InvalidArgumentError: cannot compute Mul as input #1(zero-based) was expected to be a float tensor but is a double tensor [Op:Mul]` ) -> TF Ver 2.8.4 – Anugraha Sinha Mar 03 '23 at 02:05
3

I think using class weight in model.fit is not correct. {0:0.11, 1:0.89}, 0 here is the index, not the 0 class. Keras Documentation: https://keras.io/models/sequential/ class_weight: Optional dictionary mapping class indices (integers) to a weight (float) value, used for weighting the loss function (during training only). This can be useful to tell the model to "pay more attention" to samples from an under-represented class.

Cheng Yang
  • 31
  • 1
  • 1
    Hmm, this would suggest that the API doesn't provide a way to weigh classes in a binary classification problem. Is this the case? – sid-kap Jan 15 '18 at 04:32
  • 2
    @sid-kap I have the exact same question if the API has any way to provide class_weight for binary classification. Were you able to find the answer? – Naman Apr 15 '18 at 19:28
  • How to use the class weights in the scenario of binary segmenation, where we use binary cross entropy and our label (mask) contains float values(1.0 and 0.0) ? Can we give float values as label or does the label indicate the index only? – anilsathyan7 Aug 13 '19 at 18:10
2

You can calc the weights like this and have the binary cross entropy like this which will programmatically put one_weight to 0.11 and one to 0.89:

one_weight = (1-num_of_ones)/(num_of_ones + num_of_zeros)
zero_weight = (1-num_of_zeros)/(num_of_ones + num_of_zeros)

def weighted_binary_crossentropy(zero_weight, one_weight):

    def weighted_binary_crossentropy(y_true, y_pred):

        b_ce = K.binary_crossentropy(y_true, y_pred)

        # weighted calc
        weight_vector = y_true * one_weight + (1 - y_true) * zero_weight
        weighted_b_ce = weight_vector * b_ce

        return K.mean(weighted_b_ce)

    return weighted_binary_crossentropy
Sayan Dey
  • 771
  • 6
  • 13
  • This is useful because `class_weights` doesn't work in the case of image classification. There is an error on the definition of the weights (they are negative because of the brackets!). – elbe Jun 07 '22 at 16:35
  • I doubt this is correct -> `(1-num_of_ones)/(num_of_ones + num_of_zeros)` It should instead be like -> `1-(num_of_ones/num_of_ones + num_of_zeros)` isn't it? Similarly for the `zero_weight` – Anugraha Sinha Mar 02 '23 at 09:41
2

For me, the best way to do it is this:

def custom_weighted_binary_crossentropy(zero_weight, one_weight):

    def weighted_binary_crossentropy(y_true, y_pred):
        y_true = K.cast(y_true, dtype=tf.float32)

        epsilon = tf.keras.backend.epsilon()
        y_pred = tf.clip_by_value(y_pred, epsilon, 1. - epsilon)

        # Compute cross entropy from probabilities.
        bce = y_true * tf.math.log(y_pred + epsilon)
        bce += (1 - y_true) * tf.math.log(1 - y_pred + epsilon)
        bce = -bce

        # Apply the weights to each class individually
        weight_vector = y_true * one_weight + (1. - y_true) * zero_weight
        weighted_bce = weight_vector * bce

        # Return the mean error
        return tf.reduce_mean(weighted_bce)

    return weighted_binary_crossentropy
0

In the case where you need to have a weighted validation loss with different weights than of the training loss, you can use the parameter validation_data of tensorflow.keras.model.fit() by putting your validation dataset as a tuple of Numpy arrays containing your validation data, labels and a weight for each sample.

Note that you will have to map each sample to its weight using this technique (here by class).

Follow the link here : https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit

tensorflow documentation

Tina
  • 1
  • 1