3

I'm using a small custom function inside of tf.contrib.seq2seq.sequence_loss(softmax_loss_function=[...]) as a custom sofmax_loss_function:

    def reduced_softmax_loss(self, labels, logits):
        top_logits, indices = tf.nn.top_k(logits, self.nb_top_classes, sorted=False)
        top_labels = tf.gather(labels, indices)

        return tf.nn.softmax_cross_entropy_with_logits_v2(labels=top_labels,
                                                          logits=top_logits)

But even though, labels and logits should have the same dimension, after execution it returns and InvalidArgumentError:

indices[1500,1] = 2158 is not in [0, 1600) with numbers varying due to my random seed.

Is there an other function like tf.gather which I could use instead? Or is the returned value in false shape?

Everything works fine, if I'm passing the usual Tensorflow functions.

Thanks in advance!

JtheB
  • 98
  • 6
  • I think you need to pass `axis=-1` to `tf.gather`. – jdehesa Mar 13 '19 at 16:43
  • That doesn't work, even though it is a great idea! The loss function needs the whole probability distribution at the indices points to be plugged into the softmax_cross_entropy function. – JtheB Mar 13 '19 at 17:08
  • 1
    Right, no, that wasn't right. I think you need something like [what I posted here](https://stackoverflow.com/a/55067844). So maybe `top_labels = tf.gather_nd(labels, tf.stack([tf.tile(tf.expand_dims(tf.range(tf.shape(indices)[0]), 1), (1, self.nb_top_classes)), indices], axis=-1))`. – jdehesa Mar 13 '19 at 17:34
  • Wait you also posted that question haha, so is this a different problem or is it that the answer there didn't work out for you? – jdehesa Mar 13 '19 at 17:35
  • @jdehesa the other answer worked totally fine for the intended purpose and later in the CNN, but now I want to make a seq2seq to generate something and there they recommended this loss function. Your `tf.gather_nd` seems to expand the tensor like in my other question, but here I intend to shorten the tensor of labels to fit to the top_k of logits. So I can speed up the calculation of the loss. That's why I thought just the `tf.gather` could work. – JtheB Mar 14 '19 at 09:10

1 Answers1

0

It's hard to tell what's going on by just looking at your code but I don't think the code you wrote does what you want it to do. The tf.gather operation expects an indices input where each scalar value indexes into the outermost dimension of the first argument, but here the output of top_k tries to index into both the rows and columns, which leads to out of bound errors.

Alexandre Passos
  • 5,186
  • 1
  • 14
  • 19
  • That's a fair point but as far as I understood top_k uses the last dimension of the of the tensor it was fed with. And this dimension is identical at both the labels and the logits. This is why I'm a little bit astonished that it runs out of bounds. – JtheB Mar 15 '19 at 12:54