1

I want to implement a custom loss function in keras based on binary_crossEntropy. I have question about shape of output tnesor of Keras.losses.binary_crossentropy. I expect that it should be a 1D tensor with length of batch_size. but it returns a tensor with shape of [batch size, classes] with identical loss amount in each row for all classes. should i manually use max along rows? is there a better way? and why the output of K.binary_crossentropy is not 1d tensor? is it related to math concepts?

def custom_loss(y_true, y_pred):
    loss_tensor = K.binary_crossentropy(y_true, y_pred)
    # return K.max(loss_tensor, axis=1)
    return  loss_tensor

# model.compile(loss={'classifier':'kullback_leibler_divergence'},optimizer='Nadam',metrics=['acc'])


tmp_y_true = tf.constant([[0.0, 1.0], [1.0, 0.0]])
tmp_y_pred = tf.constant([[0.8, 0.2], [0.75, 0.25]])
output = custom_loss(tmp_y_true, tmp_y_pred)
tmp_out = K.eval(output)
Jafar Gh
  • 137
  • 2
  • 12

2 Answers2

1

Binary cross-entropy is a confusing name. It does NOT mean binary in the sense of each datapoint getting either a 0 or a 1. It's used for multi-class problems. E.g. predicting whether an image has 0 dogs, 0 cats or 1 dog, 0 cats or 0 dogs, 1 cat or 1 dog, 1 cat. Each class has its own separate prediction of whether or not it's present. The loss is binary in the sense that each class is binary (present or not). So the expected output shape is [batch size, classes].

See more information at:

ubershmekel
  • 11,864
  • 10
  • 72
  • 89
1

The formula for calculating binary_crossentropy is

−(ylog(p)+(1−y)log(1−p))

but it returns a tensor with shape of [batch size, classes] with identical loss amount in each row for all classes.

This is because the binary_crossentropy is applied at each location. Taking the first set in the example provided, y_true = [0.0, 1.0] and y_pred = [0.8, 0.2]

y_true = 0, y_pred = 0.8, applying the formula, loss = -(0 * log(0.8) + 1 * log(1 - 0.8)) = 1.609

y_true = 1, y_pred = 0.2, applying the formula, loss = -(1 * log(0.2) + 0 * log(1 - 0.2)) = 1.609

>>> y_true = tf.constant([0.0, 1.0])
>>> y_pred = tf.constant([0.8, 0.2])
>>> K.eval(K.binary_crossentropy(y_true, y_pred))
array([1.6094381, 1.609438 ], dtype=float32)

should i manually use max along rows?

No, as the values are one-hot encoded, the mean has to be taken.

>>> K.eval(K.mean(K.binary_crossentropy(y_true, y_pred), axis=-1))
1.609438

https://github.com/keras-team/keras/blob/ed07472bc5fc985982db355135d37059a1f887a9/keras/losses.py#L76

Alternatively, categorical_crossentropy could be used as the values are one-hot encoded.

>>> K.eval(K.categorical_crossentropy(y_true, y_pred))
1.609438
Manoj Mohan
  • 5,654
  • 1
  • 17
  • 21