This can't be done efficiently.
tf.contrib.seq2seq.sequence_loss
is designed to work with very large vocabularies, hence it's expecting a loss function from sparse family (see this question for details). The main difference is that labels use ordinal encoding instead of one-hot, because the latter takes too much memory. Actual one-hot encoding is never computed.
label_smoothing
parameter of tf.losses.softmax_cross_entropy
on the other hand is an option to manipulate the one-hot encoding. Here's what it does:
if label_smoothing > 0:
num_classes = math_ops.cast(
array_ops.shape(onehot_labels)[1], logits.dtype)
smooth_positives = 1.0 - label_smoothing
smooth_negatives = label_smoothing / num_classes
onehot_labels = onehot_labels * smooth_positives + smooth_negatives
As you can see, to compute this tensor, onehot_labels
must be stored explicitly, which is exactly what sparse functions try to avoid. That's why neither tf.nn.sparse_softmax_cross_entropy_with_logits
, nor tf.contrib.seq2seq.sequence_loss
provide a similar parameter. Of course, you can do the conversion yourself, but this defeats the whole optimization.