After using TensorFlow for quite a while I have read some Keras tutorials and implemented some examples. I have found several tutorials for convolutional autoencoders that use keras.losses.binary_crossentropy
as the loss function.
I thought binary_crossentropy
should not be a multi-class loss function and would most likely use binary labels, but in fact Keras (TF Python backend) calls tf.nn.sigmoid_cross_entropy_with_logits
, which actually is intended for classification tasks with multiple, independent classes that are not mutually exclusive.
On the other hand, my expectation for categorical_crossentropy
was to be intended for multi-class classifications where target classes have a dependency on each other, but are not necessarily one-hot encoded.
However, the Keras documentation states:
(...) when using the categorical_crossentropy loss, your targets should be in categorical format (e.g. if you have 10 classes, the target for each sample should be a 10-dimensional vector that is all-zeros expect for a 1 at the index corresponding to the class of the sample).
If I am not mistaken, this is just the special case of one-hot encoded classification tasks, but the underlying cross-entropy loss also works with probability distributions ("multi-class", dependent labels)?
Additionally, Keras uses tf.nn.softmax_cross_entropy_with_logits
(TF python backend) for the implementation, which itself states:
NOTE: While the classes are mutually exclusive, their probabilities need not be. All that is required is that each row of labels is a valid probability distribution. If they are not, the computation of the gradient will be incorrect.
Please correct me if I am wrong, but it looks to me that the Keras documentation is - at least - not very "detailed"?!
So, what is the idea behind Keras' naming of the loss functions? Is the documentation correct? If the binary cross entropy would really rely on binary labels, it should not work for autoencoders, right?! Likewise the categorical crossentropy: should only work for one-hot encoded labels if the documentation is correct?!