In a two class issue with one-hot label, why tf.losses.softmax_cross_entropy outputs very large cost

Question

I am training a mobilenet for semantic segmentation on tf. The targets has two classes: foreground(1) or background(0). So this is a two class classification issue. I choose softmax cross entropy as the loss, using python code like:

tf.losses.softmax_cross_entropy(self.targets, logits)

Size of targets and logits is [batch_size, 224, 224, 2]. However, the loss becomes very big after 100 batches, curve is like this:

From tf's api docus, I know that tf.losses.softmax_cross_entropy has [batch_size, num_classes] target one-hot-encoded labels, which coincides with my label with size of [batch_size,224,224,2], either 1 or 0 (which is exclusive). So, softmax_cross_entropy can't be used in a two class one-hot label case like this and why? If it can be used, where is my problem?

If I use tf.losses.sigmoid_cross_entropy or tf.losses.sparse_softmax_cross_entropy(only give label size: [batch_size, 224,224,1]), the loss will converge.

O.Suleiman · Accepted Answer · 2018-01-18T10:37:22.540

0

The input targets format for tf.losses.softmax_cross_entropy and tf.losses.sparse_softmax_cross_entropy is different.

For sparse_softmax_cross_entropy_with_logits labels must have the shape [batch_size] and the dtype int32 or int64. Each label is an int in range [0, num_classes-1].

For softmax_cross_entropy_with_logits, labels must have the shape [batch_size, num_classes] and dtype float32 or float64.

Labels used in softmax_cross_entropy_with_logits are the one hot version of labels used in sparse_softmax_cross_entropy_with_logits.

Reference: TensorFlow: what's the difference between sparse_softmax_cross_entropy_with_logits and softmax_cross_entropy_with_logits?

So, to make it work using softmax_cross_entropy_with_logits, you will have to do tf.one_hot on the labels.

edited Jan 18 '18 at 10:37

answered Jan 18 '18 at 09:11

O.Suleiman

898
1
6
11

The labels do have difference. But sigmoid_cross_entropy should has a label size [batch_size, num_classes], refer to this link: https://www.tensorflow.org/api_docs/python/tf/losses/sigmoid_cross_entropy. Actually, from tf's api_doc, sigmoid_cross_entropy and softmax_cross_entropy have the same label size, the difference is if the label is exclusive. In my case, I gave a label size of [batch_size, 224, 224, 2] for a two class classification, each pixel can be 1 or 0, but can be 1 at the same time, which is one-hot. So I think one-hot label can be used for the two function, but not work. – Pavle Jan 18 '18 at 10:19
You are right about the sigmoid_cross_entropy, it was a mistake on my side, I have updated my answer. Check this out: https://stackoverflow.com/questions/33681517/tensorflow-one-hot-encoder – O.Suleiman Jan 18 '18 at 10:24

In a two class issue with one-hot label, why tf.losses.softmax_cross_entropy outputs very large cost

1 Answers1