Significance of binary_crossentropy loss when used on for multiclass classification?

Question

Should you always use categorical_crossentropy loss for multiclass classification problems? Or does binary_crossentropy have significance in some way too?

To be clear, multiclass means an input can take on one of many classes. And the problem is single label, meaning an input cannot take on multiple classes at the same time.

score 0 · Answer 1 · answered Dec 13 '17 at 09:54

0

K.binary_crossentropy doesn't make sense for multi-class classification, it's for binary multi-label classification. This post outlines the differences in detail. Keep in mind that under the hood^*:

K.binary_crossentropy is tf.nn.sigmoid_cross_entropy_with_logits
K.categorical_crossentropy is tf.nn.softmax_cross_entropy_with_logits

_{* assuming you're using tensorflow backend}

answered Dec 13 '17 at 09:54

Maxim

52,561
27
155
209

Thanks. Does that mean the last layer for binary_crossentropy should always use sigmoid activation and for categorical_crossentropy should always use softmax activation? also - since binary_crossentropy is multi-label, does that mean the probabilities that are output after the last layer don't have to sum up to 1? – megashigger Dec 13 '17 at 18:57
`binary_crossentropy` itself performs sigmoid activation, that's why tensorflow function is named *with logits*. Additional sigmoid layer will only do harm. The same is true for `categorical_crossentropy`. And yes, the probabilities are per-feature and don't have to sum to 1. – Maxim Dec 13 '17 at 19:05
you sure? almost all examples I've seen with categorical_crossentropy loss have a softmax as the last layer as well. – megashigger Dec 13 '17 at 19:58
Oops, my bad. In Keras, the default is `from_logits=False`, in this case it's expecting an input as probability. But just to make it clear: this way you compute sigmoid output, then convert it back to logits, then apply `sigmoid_cross_entropy_with_logits`, which computes sigmoid one more time. Check out the source code. I mostly do it directly in tensorflow and so I never use sigmoid/softmax layer myself. – Maxim Dec 13 '17 at 20:08

Significance of binary_crossentropy loss when used on for multiclass classification?

1 Answers1