After using keras by implementing some examples and looking for tutorials I am kind of confused which cross entropy function I should use in my project. In my case I want to predict multiple labels such as (positive, negative and neutral) for online comments with a LSTM model. The labels have been converted to one-hot vectors with the to_categorical method in keras, which is also documented in keras:
(...) when using the categorical_crossentropy loss, your targets should be in categorical format (e.g. if you have 10 classes, the target for each sample should be a 10-dimensional vector that is all-zeros expect for a 1 at the index corresponding to the class of the sample).
The one-hot array looks as follow:
array([[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
Because there a multiple labels I would prefer to use categorical_crossentropy. I also implemented a model with this criteria but the accuracy of this model was only above 20%. Using binary_crossentropy with a sigmoid function my accuracy have been reached to 80%. I am really confused, because some guys argued with the following statements:
the accuracy computed with the Keras method "evaluate" is just plain wrong when using binary_crossentropy with more than 2 labels
whereas other have already implemented high performanced model with binary crossentropy and multiple labels, which is kind of the same workflow.
We want probability of each class. So we are using sigmoid on final layer, which gives output in range 0 to 1. If our aim was to find the class, then we will have used softmax
So I just want to know if there are any problems if I would to choose the binary_crossentropy like in the following link to predict the outcome class.