59

I am trying to apply deep learning for a binary classification problem with high class imbalance between target classes (500k, 31K). I want to write a custom loss function which should be like: minimize(100-((predicted_smallerclass)/(total_smallerclass))*100)

Appreciate any pointers on how I can build this logic.

7 Answers7

49

You can add class weights to the loss function, by multiplying logits. Regular cross entropy loss is this:

loss(x, class) = -log(exp(x[class]) / (\sum_j exp(x[j])))
               = -x[class] + log(\sum_j exp(x[j]))

in weighted case:

loss(x, class) = weights[class] * -x[class] + log(\sum_j exp(weights[class] * x[j]))

So by multiplying logits, you are re-scaling predictions of each class by its class weight.

For example:

ratio = 31.0 / (500.0 + 31.0)
class_weight = tf.constant([ratio, 1.0 - ratio])
logits = ... # shape [batch_size, 2]
weighted_logits = tf.mul(logits, class_weight) # shape [batch_size, 2]
xent = tf.nn.softmax_cross_entropy_with_logits(
  weighted_logits, labels, name="xent_raw")

There is a standard losses function now that supports weights per batch:

tf.losses.sparse_softmax_cross_entropy(labels=label, logits=logits, weights=weights)

Where weights should be transformed from class weights to a weight per example (with shape [batch_size]). See documentation here.

ilblackdragon
  • 1,834
  • 12
  • 12
45

The code you proposed seems wrong to me. The loss should be multiplied by the weight, I agree.

But if you multiply the logit by the class weights, you end with:

weights[class] * -x[class] + log( \sum_j exp(x[j] * weights[class]) )

The second term is not equal to:

weights[class] * log(\sum_j exp(x[j]))

To show this, we can be rewrite the latter as:

log( (\sum_j exp(x[j]) ^ weights[class] )

So here is the code I'm proposing:

ratio = 31.0 / (500.0 + 31.0)
class_weight = tf.constant([[ratio, 1.0 - ratio]])
logits = ... # shape [batch_size, 2]

weight_per_label = tf.transpose( tf.matmul(labels
                           , tf.transpose(class_weight)) ) #shape [1, batch_size]
# this is the weight for each datapoint, depending on its label

xent = tf.mul(weight_per_label
         , tf.nn.softmax_cross_entropy_with_logits(logits, labels, name="xent_raw") #shape [1, batch_size]
loss = tf.reduce_mean(xent) #shape 1
Wosi
  • 41,986
  • 17
  • 75
  • 82
JL Meunier
  • 471
  • 4
  • 4
  • 3
    I am facing the same issue, but in trying to understand the code above I do not understand `\sum_` - can you please explain that? It seems to be latex code; does that work in Python? – Ron Cohen Aug 15 '16 at 15:18
  • But in fact the best approach is to build balanced mini-batches!! – JL Meunier Aug 19 '16 at 08:30
  • 1
    @Ron: the equation just says that it is different to: multiply the logit by the class weight vs multiply the distance (cross entropy) by the weights. The code at bottom does work in Python. But overall, just manage to balance each minibatch and you will get a better model! – JL Meunier Aug 19 '16 at 08:33
  • Thanks JL. In your code example, I assume that `labels` is a one-hot 1-D tensor, correct? BTW in my particular application balancing classes causes other problems so I need to use the weighting approach. – Ron Cohen Aug 30 '16 at 14:59
  • 4
    I think this should be the accepted answer, since we want to multiply the distance and not the logits by the weights. – Roger Trullo Oct 23 '16 at 02:01
  • Is this `weight_per_label` approach differentiable? If not then how would this work with backpropagation? – Ron Cohen Nov 25 '16 at 02:40
  • @RonCohen it is differentiable, it's just an element-wise multiplication. – Emma Strubell Dec 16 '16 at 17:18
  • 1
    @JLMeunier Can you explain / provide a citation justifying why balanced minibatches are better? They are certainly a much bigger pain to implement. – Emma Strubell Dec 16 '16 at 17:20
  • Actually, if you multiply the logits by the weights, you're changing each `x[i]` to `x[i] * weights[i]` and therefore you should end up with `weights[class] * -x[class] + log( \sum_j exp(x[j] * weights[j]) )`. Of course that doesn't change what you correctly propose in the rest of the post. – Wiseful Jan 06 '17 at 07:22
  • Does using criss entropy ratehr than logistic loss work better for imbalanced data? That's what I seem to be finding but trying to find some research to back up my observation. – Reddspark Aug 25 '17 at 19:03
  • The answers is really incomplete at least for newbies. And to partially answer the question about latex in the 'code', no the backslash is NOT for python. It is just to indicate a math sum over a series. – tobi delbruck Jun 07 '23 at 06:21
13

Use tf.nn.weighted_cross_entropy_with_logits() and set pos_weight to 1 / (expected ratio of positives).

Floern
  • 33,559
  • 24
  • 104
  • 119
Malay Haldar
  • 171
  • 1
  • 4
  • I'm still newbie in deep learning so excuse me if my question is a naïve. what do you mean by expected ratio of positives? and what is the difference between this function and 'sigmoid_cross_entropy'? – Maystro Dec 20 '17 at 23:22
5

You can check the guides at tensorflow https://www.tensorflow.org/api_guides/python/contrib.losses

...

While specifying a scalar loss rescales the loss over the entire batch, we sometimes want to rescale the loss per batch sample. For example, if we have certain examples that matter more to us to get correctly, we might want to have a higher loss that other samples whose mistakes matter less. In this case, we can provide a weight vector of length batch_size which results in the loss for each sample in the batch being scaled by the corresponding weight element. For example, consider the case of a classification problem where we want to maximize our accuracy but we especially interested in obtaining high accuracy for a specific class:

inputs, labels = LoadData(batch_size=3)
logits = MyModelPredictions(inputs)

# Ensures that the loss for examples whose ground truth class is `3` is 5x
# higher than the loss for all other examples.
weight = tf.multiply(4, tf.cast(tf.equal(labels, 3), tf.float32)) + 1

onehot_labels = tf.one_hot(labels, num_classes=5)
tf.contrib.losses.softmax_cross_entropy(logits, onehot_labels, weight=weight)
4

I had to work with a similar unbalanced dataset of multiple classes and this is how I worked through it, hope it will help somebody looking for a similar solution:

This goes inside your training module:

from sklearn.utils.class_weight import compute_sample_weight
#use class weights for handling unbalanced dataset
if mode == 'INFER' #test/dev mode, not weighing loss in test mode
   sample_weights = np.ones(labels.shape)
else:
   sample_weights = compute_sample_weight(class_weight='balanced', y=labels)

This goes inside your model class definition:

#an extra placeholder for sample weights
#assuming you already have batch_size tensor
self.sample_weight = tf.placeholder(dtype=tf.float32, shape=[None],
                       name='sample_weights')
cross_entropy_loss = tf.nn.sparse_softmax_cross_entropy_with_logits(
                       labels=self.label, logits=logits, 
                       name='cross_entropy_loss')
cross_entropy_loss = tf.reduce_sum(cross_entropy_loss*self.sample_weight) / batch_size
bitspersecond
  • 148
  • 1
  • 1
  • 7
3

Did ops tf.nn.weighted_cross_entropy_with_logits() for two classes:

classes_weights = tf.constant([0.1, 1.0])
cross_entropy = tf.nn.weighted_cross_entropy_with_logits(logits=logits, targets=labels, pos_weight=classes_weights)
Denis Shcheglov
  • 113
  • 1
  • 4
3
""" Weighted binary crossentropy between an output tensor and a target tensor.
# Arguments
    pos_weight: A coefficient to use on the positive examples.
# Returns
    A loss function supposed to be used in model.compile().
"""
def weighted_binary_crossentropy(pos_weight=1):
    def _to_tensor(x, dtype):
        """Convert the input `x` to a tensor of type `dtype`.
        # Arguments
            x: An object to be converted (numpy array, list, tensors).
            dtype: The destination type.
        # Returns
            A tensor.
        """
        return tf.convert_to_tensor(x, dtype=dtype)
  
  
    def _calculate_weighted_binary_crossentropy(target, output, from_logits=False):
        """Calculate weighted binary crossentropy between an output tensor and a target tensor.
        # Arguments
            target: A tensor with the same shape as `output`.
            output: A tensor.
            from_logits: Whether `output` is expected to be a logits tensor.
                By default, we consider that `output`
                encodes a probability distribution.
        # Returns
            A tensor.
        """
        # Note: tf.nn.sigmoid_cross_entropy_with_logits
        # expects logits, Keras expects probabilities.
        if not from_logits:
            # transform back to logits
            _epsilon = _to_tensor(K.epsilon(), output.dtype.base_dtype)
            output = tf.clip_by_value(output, _epsilon, 1 - _epsilon)
            output = log(output / (1 - output))
        target = tf.dtypes.cast(target, tf.float32)
        return tf.nn.weighted_cross_entropy_with_logits(labels=target, logits=output, pos_weight=pos_weight)


    def _weighted_binary_crossentropy(y_true, y_pred):
        return K.mean(_calculate_weighted_binary_crossentropy(y_true, y_pred), axis=-1)
    
    return _weighted_binary_crossentropy

For usage:

pos = #count of positive class
neg = #count of negative class
total = pos + neg
weight_for_0 = (1 / neg)*(total)/2.0 
weight_for_1 = (1 / pos)*(total)/2.0

class_weight = {0: weight_for_0, 1: weight_for_1}

model = <your model>

model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-4),
    loss=weighted_binary_crossentropy(weight_for_1),
    metrics=tf.keras.metrics.Precision(name='precision')
)
tttzof351
  • 359
  • 2
  • 10