How to mask logits for tf.softmax_cross_entropy_with_logits to implement valid actions

Question

I want to compute the softmax_cross_entropy_with_logits of a batch tensor. I have a batch of logits tensor as input, however I want to mask this tensor before with a boolean mask. The boolean mask is also a batch of masks, in every mask there can be a different amount of True. Thus applying this mask to the whole tensor will not be dense afterwards.
Trying this either flattens the tensor (tf.boolean_mask) or creates a ragged one (tf.ragged.boolean_mask), which both produce wrong results or don't work with the softmax function.

So basically I want to make the following code work:

# logits.shape = (batch, outputs), e.g. (512,8)
# mask.shape = (batch, valid), e.g. (512,8)
# expected result shape (512,)
one_hot_actions = tf.one_hot(x, logits.get_shape().as_list()[-1])
stopgradient = tf.stop_gradient(one_hot_actions)
return tf.nn.softmax_cross_entropy_with_logits_v2(
    logits=tf.boolean_mask(logits, mask),
    labels=tf.boolean_mask(stopgradient, mask))

But with tf.boolean_mask this produces just one value, not four and with tf.ragged.boolean_mask this function does not work.

I tried combing the two ragged tensors row wise (first masked logits row with first masked labels row) and compute the softmax rowwise. This did not work since the tf.map_fn that I used does not accept ragged tensors as inputs. I tried this same idea also with SparseTensors and list of Tensors (tf.split) but never got any working code out of it.

The only idea I had to solve this issue is very ugly. Replace all masked values with tf.where to NaN and then use map_fn on these now dense tensors. Every row I can then mask again to exclude NaN and now can call the softmax function row-wise again.
EDIT This is what the code currently looks like:

stopgradient = tf.stop_gradient(one_hot_actions)
nan_logits = tf.where(mask, logits, float('NaN') + tf.zeros_like(self.logits))
nan_labels = tf.where(mask, stopgradient, float('NaN') + tf.zeros_like(stopgradient))
nan_lola = tf.stack([nan_logits, nan_labels], axis=1)
def fn(x):
    nan_lo = x[0]
    nan_la = x[1]
    masked_lo = tf.boolean_mask(nan_lo, tf.logical_not(tf.math.is_nan(nan_lo)))
    masked_la = tf.boolean_mask(nan_la, tf.logical_not(tf.math.is_nan(nan_la)))
    return tf.nn.softmax_cross_entropy_with_logits_v2(
        logits=masked_lo,
        labels=masked_la
    )
result = tf.map_fn(fn, nan_lola)
return result

This works but is very slow (my training time almost doubles).

To those interested: I stumbled upon this problem trying to mask valid actions in reinforcement learning.

Do you know of any way to do this (faster)? Can you replace the masked values with a value that does not affect the softmax? Thank you!

Not yet an answer...: I have the same problem and have stumbled across https://www.tensorflow.org/agents/api_docs/python/tf_agents/distributions/masked/MaskedCategorical which looks like it might be designed for just this use case. Hopefully it works! — Oly, Feb 18 '21 at 15:22

How to mask logits for tf.softmax_cross_entropy_with_logits to implement valid actions

0 Answers0