TensorFlow sequence_loss with label_smoothing

Question

Would it be possible to use the label_smoothing feature from tf.losses.softmax_cross_entropy with tf.contrib.seq2seq.sequence_loss ?

I can see that sequence_loss optionally takes a softmax_loss_function as parameter. However, this function would take the targets as a list of ints, instead of one-hot encoded vectors required by tf.losses.softmax_cross_entropy, which is also the only function to support label_smoothing in TensorFlow.

Can you recommend a way of making label_smoothing work with sequence_loss ?

Maxim · Accepted Answer · 2018-03-06T19:23:49.817

This can't be done efficiently.

tf.contrib.seq2seq.sequence_loss is designed to work with very large vocabularies, hence it's expecting a loss function from sparse family (see this question for details). The main difference is that labels use ordinal encoding instead of one-hot, because the latter takes too much memory. Actual one-hot encoding is never computed.

label_smoothing parameter of tf.losses.softmax_cross_entropy on the other hand is an option to manipulate the one-hot encoding. Here's what it does:

if label_smoothing > 0:
  num_classes = math_ops.cast(
      array_ops.shape(onehot_labels)[1], logits.dtype)
  smooth_positives = 1.0 - label_smoothing
  smooth_negatives = label_smoothing / num_classes
  onehot_labels = onehot_labels * smooth_positives + smooth_negatives

As you can see, to compute this tensor, onehot_labels must be stored explicitly, which is exactly what sparse functions try to avoid. That's why neither tf.nn.sparse_softmax_cross_entropy_with_logits, nor tf.contrib.seq2seq.sequence_loss provide a similar parameter. Of course, you can do the conversion yourself, but this defeats the whole optimization.

TensorFlow sequence_loss with label_smoothing

1 Answers1