keras categorical and binary crossentropy

Question

After using keras by implementing some examples and looking for tutorials I am kind of confused which cross entropy function I should use in my project. In my case I want to predict multiple labels such as (positive, negative and neutral) for online comments with a LSTM model. The labels have been converted to one-hot vectors with the to_categorical method in keras, which is also documented in keras:

(...) when using the categorical_crossentropy loss, your targets should be in categorical format (e.g. if you have 10 classes, the target for each sample should be a 10-dimensional vector that is all-zeros expect for a 1 at the index corresponding to the class of the sample).

The one-hot array looks as follow:

array([[1., 0., 0.],
      [1., 0., 0.],
      [0., 0., 1.],

Because there a multiple labels I would prefer to use categorical_crossentropy. I also implemented a model with this criteria but the accuracy of this model was only above 20%. Using binary_crossentropy with a sigmoid function my accuracy have been reached to 80%. I am really confused, because some guys argued with the following statements:

the accuracy computed with the Keras method "evaluate" is just plain wrong when using binary_crossentropy with more than 2 labels

whereas other have already implemented high performanced model with binary crossentropy and multiple labels, which is kind of the same workflow.

We want probability of each class. So we are using sigmoid on final layer, which gives output in range 0 to 1. If our aim was to find the class, then we will have used softmax

So I just want to know if there are any problems if I would to choose the binary_crossentropy like in the following link to predict the outcome class.

`categorical_crossentropy` is designed to be used with exclusive classes, that is, when each example belongs 100% to a single class; it assumes the output is a softmax function, where the class probabilities add up to 1. `binary_cropssentropy` can work if classes are not exclusive (e.g. "this picture is `dog` but also `outdoors`"), and assumes each class has an independent probability from 0 to 1 (hence the sigmoids). If the barrier between your classes is not super definite (e.g. an example can be "neutral-positive-ish") it maymake sense to use binary, but it depends on the problem. — jdehesa, May 24 '18 at 09:10

score 2 · Accepted Answer · edited Dec 06 '19 at 14:26

You confused multilabel and multiclass classification.

In multiclass, your classifier chooses one class from N other classes. Usually, the last layer in neural networks that do multiclass classification is a softmax layer. That means that each output row will sum up to 1 (it forms a probability distribution over these N classes).

Multilabel classification, on the other hand, consists of making a binary choice for N questions. It makes sense to use binary cross-entropy for that, since the way most neural network framework work makes it behave like you calculate average binary cross-entropy over these binary tasks. In neural networks that are multilabel classifiers, sigmoid is used as the last layer (Kaggle kernel you linked uses sigmoid as activation in the last layer).

keras categorical and binary crossentropy

1 Answers1