sampled_softmax_loss() computes and returns the sampled softmax training loss.
This is a faster way to train a softmax classifier over a huge number of classes.
This operation is for training only. It is generally an underestimate of the full softmax loss.
A common use case is to use this method for training, and calculate the full softmax loss for evaluation or inference. In this case, you must set partition_strategy="div"
for the two losses to be consistent, as in the following example:
if mode == "train":
loss = tf.nn.sampled_softmax_loss(
weights=weights,
biases=biases,
labels=labels,
inputs=inputs,
...,
partition_strategy="div")
elif mode == "eval":
logits = tf.matmul(inputs, tf.transpose(weights))
logits = tf.nn.bias_add(logits, biases)
labels_one_hot = tf.one_hot(labels, n_classes)
loss = tf.nn.softmax_cross_entropy_with_logits(
labels=labels_one_hot,
logits=logits)
Where regular loss functions like CategoricalCrossentropy() uses it's default values, even if you don't pass any arguments it will calculate the loss based on its default values.
The key point for sampled_softmax_loss is to pass right shape of weight
, bias
, input
and label
.
The shape of weight passed to sampled_softmax is not the the same with the general situation.
For example, logits = xw + b
, call sampled_softmax like this:
sampled_softmax(weight=tf.transpose(w), bias=b, inputs=x)
,
NOT sampled_softmax(weight=w, bias=b, inputs=logits)
!!
Besides, label is not one-hot representation. if your labels are one-hot represented, pass labels=tf.reshape(tf.argmax(labels_one_hot, 1), [-1,1])