I am training a mobilenet for semantic segmentation on tf. The targets has two classes: foreground(1) or background(0). So this is a two class classification issue. I choose softmax cross entropy as the loss, using python code like:
tf.losses.softmax_cross_entropy(self.targets, logits)
Size of targets and logits is [batch_size, 224, 224, 2]
. However, the loss becomes very big after 100 batches, curve is like this:
From tf's api docus, I know that tf.losses.softmax_cross_entropy
has [batch_size, num_classes] target one-hot-encoded labels, which coincides with my label with size of [batch_size,224,224,2]
, either 1 or 0 (which is exclusive).
So, softmax_cross_entropy can't be used in a two class one-hot label case like this and why? If it can be used, where is my problem?
If I use tf.losses.sigmoid_cross_entropy
or tf.losses.sparse_softmax_cross_entropy
(only give label size: [batch_size, 224,224,1]
), the loss will converge.