1

I’ve a question regarding to the sparse_softmax_cross_entropy cost function in TensorFlow.

I want to use it in a semantic segmentation context where I use an autoencoder architecture which uses typical convolution operations to downsample images to create a feature vector. This vector is than upsampled (using conv2d_transposeand one-by-one convolutions to create an output image. Hence, my input consists of single channel images with shape (1,128,128,1), where the first index represents the batch size and the last one the number of channels. The pixel of the image are currently either 0 or 1. So each pixel is mapped to a class. The output image of the autoencoder follows the same rules. Hence, I can’t use any predefined cost function than either MSE or the previously mentioned one.

The network works fine with MSE. But I can’t get it working with sparse_softmax_cross_entropy. It seems like that this is the correct cost function in this context but I’m a bit confused about the representation of the logits. The official doc says that the logits should have the shape (d_i,...,d_n,num_classes). I tried to ignore the num_classes part but this causes an error which says that only the interval [0,1) is allowed. Of course, I need to specify the number of classes which would turn the allowed interval to [0,2) because the exclusive upper bound is obviously num_classes.

Could someone please explain how to turn my output image into the required logits?

The current code for the cost function is:

self._loss_op = tf.reduce_mean((tf.nn.sparse_softmax_cross_entropy_with_logits(labels=tf.squeeze(self._target_placeholder, [3]), logits=self._model, name="Loss")))

The squeeze removes the last dimension of the label input to create a shape for the labels of [1 128 128]. This causes the following exception:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Received a label value of 1 which is outside the valid range of [0, 1).

Edit:

As requested, here's a minimal example to verfiy the behavior of the cost function in the context of fully-convolutional nets:

constructor snipped:

def __init__(self, img_channels=1, img_width=128, img_height=128):
    ...
    self._loss_op = None
    self._learning_rate_placeholder = tf.placeholder(tf.float32, [], 'lr')
    self._input_placeholder = tf.placeholder(tf.float32, [None, img_width, img_height, img_channels], 'x')
    self._target_placeholder = tf.placeholder(tf.float32, [None, img_width, img_height, img_channels], 'y')
    self._model = self.build_model()
    self.init_optimizer()

build_model() snipped:

 def build_model(self):
        with tf.variable_scope('conv1', reuse=tf.AUTO_REUSE):
            #not necessary
            x = tf.reshape(self._input_placeholder, [-1, self._img_width, self._img_height, self._img_channels])
            conv1 = tf.layers.conv2d(x, 32, 5, activation=tf.nn.relu)
            conv1 = tf.layers.max_pooling2d(conv1, 2, 2)

        with tf.variable_scope('conv2', reuse=tf.AUTO_REUSE):
            conv2 = tf.layers.conv2d(conv1, 64, 3, activation=tf.nn.relu)
            conv2 = tf.layers.max_pooling2d(conv2, 2, 2)
        with tf.variable_scope('conv3_red', reuse=tf.AUTO_REUSE):
            conv3 = tf.layers.conv2d(conv2, 1024, 30, strides=1, activation=tf.nn.relu)
        with tf.variable_scope('conv4_red', reuse=tf.AUTO_REUSE):
            conv4 = tf.layers.conv2d(conv3, 64, 1, strides=1, activation=tf.nn.relu)
        with tf.variable_scope('conv5_up', reuse=tf.AUTO_REUSE):
            conv5 = tf.layers.conv2d_transpose(conv4, 32, (128, 128), strides=1, activation=tf.nn.relu)
        with tf.variable_scope('conv6_1x1', reuse=tf.AUTO_REUSE):
            conv6 = tf.layers.conv2d(conv5, 1, 1, strides=1, activation=tf.nn.relu)
        return conv6

init_optimizer() snipped:

def init_optimizer(self):
    self._loss_op = tf.reduce_mean((tf.nn.sparse_softmax_cross_entropy_with_logits(labels=tf.squeeze(self._target_placeholder, [3]), logits=self._model, name="Loss")))
    optimizer = tf.train.AdamOptimizer(learning_rate=self._learning_rate_placeholder)
    self._train_op = optimizer.minimize(self._loss_op)
Bastian
  • 1,553
  • 13
  • 33

1 Answers1

1

By definition the logit is an unscaled probability (strictly speaking odds) or simply put any number. The sequence of logits of length num_classes can be interpreted as unscaled probability distribution. For example, in your case, num_classes=2, then logits=[125.0, -10.0] is an unscaled probability distribution for one pixel (which clearly favors 0 over 1). This array can be squashed to a valid distribution by a softmax, and this is what tf.sparse_softmax_cross_entropy does internally. For [125.0, -10.0] the squashed distribution will be very close to [1.0, 0.0].

Once again, the array [2] is for a single pixel. If you want to compute the cross-entropy over entire image, the network has to output the binary distribution for all pixels and all images in a batch, i.e. output [batch_size, 128, 128, 2] tensor. The term sparse in the name of the loss refers to the fact that the labels are not one-hot encoded (more details here). It's most useful when the number of classes is large, i.e. one-hot encoding becomes too inefficient in terms of memory, but in your case it's insignificant. If you decide to use tf.sparse_softmax_cross_entropy loss, the labels must be [batch_size, 128, 128], it must be tf.int32 or tf.int64 and must contain correct class indices, zero or one. That's it: tensorflow can compute the cross-entropy between these two arrays.

Maxim
  • 52,561
  • 27
  • 155
  • 209
  • Thank you for your detailed answer. When I use the suggested shape, I get the following error: Received a label value of 1 which is outside the valid range of [0, 1). Label values:... Do you have any ideas how to resolve this? – Bastian Apr 18 '18 at 13:45
  • By the way, you reached your 666 answer which matches your profile pic quite well. I'd leave it this way ;) – Bastian Apr 18 '18 at 13:47
  • @BastianSchoettle Did you check the labels? Note that they must be binary – Maxim Apr 18 '18 at 14:12
  • @BastianSchoettle Looks like I also need 666 questions to make it perfect. – Maxim Apr 18 '18 at 14:14
  • Yes defintily, the labels are binary thats why it complains about the 1s in [0,1). It's an exclusive upper boundary intervall. Here its a small subset of the pixels/labels: 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 – Bastian Apr 18 '18 at 14:16
  • Start asking fancy things then :) – Bastian Apr 18 '18 at 14:17
  • @BastianSchoettle Oops, one minor correction. The labels shape should be `[?, 128, 128]`, and it must be `tf.int32`. This way tf is not complaining for me. – Maxim Apr 18 '18 at 14:23
  • It still causes the same error. I've updated my question accordingly. – Bastian Apr 18 '18 at 14:45
  • Can you make a short example to reproduce it? The code I tried worked. – Maxim Apr 18 '18 at 14:48
  • Sure! I’ll post it tomorrow and thank you very much for your help – Bastian Apr 18 '18 at 20:35
  • I’ve added the example to my question – Bastian Apr 21 '18 at 06:51
  • 1
    @Bastian , I had the same problem, I think the problem is the input to sparse_softmax_cross_entropy_with_logits: If you have two classes, the input needs to have shape (batch_size, 2) . One output per class, = 2. (You can probably concatenate your output with a vector of ones, and feed that into the function.) – dasWesen May 31 '18 at 16:16
  • 1
    In your case, you have conv6 = tf.layers.conv2d(conv5, 1, 1, strides=1, activation=tf.nn.relu), so probably an output of size 1? – dasWesen May 31 '18 at 16:19
  • That was the solution! Did missunderstood the concept in general, Thanks a lot! – Bastian Oct 10 '18 at 15:24