2

Should you always use categorical_crossentropy loss for multiclass classification problems? Or does binary_crossentropy have significance in some way too?

To be clear, multiclass means an input can take on one of many classes. And the problem is single label, meaning an input cannot take on multiple classes at the same time.

Maxim
  • 52,561
  • 27
  • 155
  • 209
megashigger
  • 8,695
  • 17
  • 47
  • 79

1 Answers1

0

K.binary_crossentropy doesn't make sense for multi-class classification, it's for binary multi-label classification. This post outlines the differences in detail. Keep in mind that under the hood*:

* assuming you're using tensorflow backend

Maxim
  • 52,561
  • 27
  • 155
  • 209
  • Thanks. Does that mean the last layer for binary_crossentropy should always use sigmoid activation and for categorical_crossentropy should always use softmax activation? also - since binary_crossentropy is multi-label, does that mean the probabilities that are output after the last layer don't have to sum up to 1? – megashigger Dec 13 '17 at 18:57
  • `binary_crossentropy` itself performs sigmoid activation, that's why tensorflow function is named *with logits*. Additional sigmoid layer will only do harm. The same is true for `categorical_crossentropy`. And yes, the probabilities are per-feature and don't have to sum to 1. – Maxim Dec 13 '17 at 19:05
  • you sure? almost all examples I've seen with categorical_crossentropy loss have a softmax as the last layer as well. – megashigger Dec 13 '17 at 19:58
  • Oops, my bad. In Keras, the default is `from_logits=False`, in this case it's expecting an input as probability. But just to make it clear: this way you compute sigmoid output, then convert it back to logits, then apply `sigmoid_cross_entropy_with_logits`, which computes sigmoid one more time. Check out the source code. I mostly do it directly in tensorflow and so I never use sigmoid/softmax layer myself. – Maxim Dec 13 '17 at 20:08