3

I've been looking around a way to use sampled softmax tf.nn.sampled_softmax_loss() for one of my models. I couldn't find any post that could help me on how to implement it.

If anyone has implemented it with Keras architecture, would you please let me know how to use it with keras?

Right now for other losses I could just use,

model.compile(loss=tf.keras.losses.CategoricalCrossentropy())

But I can't use tf.nn.sampled_softmax_loss in that manner model.compile(loss=tf.nn.sampled_softmax_loss())?

I've tried using model.compile(loss=tf.nn.sampled_softmax_loss()) but it returned error which I think it's correct because it takes in weights and biases from last layer to calculate loss which I'm not sure how to implement in keras.

user_12
  • 1,778
  • 7
  • 31
  • 72
  • have you tried using `layer.get_weights()` to get the weights of the last layer – Shubham Shaswat Jan 25 '20 at 07:53
  • @ShubhamShaswat I don't know how to use it. I've checked all other such questions on SO none of those worked for me. Also tf.nn.sampled_softmax_loss should only used for trainign and for eval we have to let model use normal entropy loss. So I don't know how to do that. I was hoping someone experienced might have implemented it. – user_12 Jan 25 '20 at 08:28
  • [check out this](https://stackoverflow.com/questions/42411891/how-to-extract-bias-weights-in-keras-sequential-model/42412124) – Shubham Shaswat Jan 25 '20 at 08:30

1 Answers1

2

sampled_softmax_loss() computes and returns the sampled softmax training loss.

This is a faster way to train a softmax classifier over a huge number of classes.

This operation is for training only. It is generally an underestimate of the full softmax loss.

A common use case is to use this method for training, and calculate the full softmax loss for evaluation or inference. In this case, you must set partition_strategy="div" for the two losses to be consistent, as in the following example:

if mode == "train":
  loss = tf.nn.sampled_softmax_loss(
      weights=weights,
      biases=biases,
      labels=labels,
      inputs=inputs,
      ...,
      partition_strategy="div")
elif mode == "eval":
  logits = tf.matmul(inputs, tf.transpose(weights))
  logits = tf.nn.bias_add(logits, biases)
  labels_one_hot = tf.one_hot(labels, n_classes)
  loss = tf.nn.softmax_cross_entropy_with_logits(
      labels=labels_one_hot,
      logits=logits)  

Where regular loss functions like CategoricalCrossentropy() uses it's default values, even if you don't pass any arguments it will calculate the loss based on its default values.

The key point for sampled_softmax_loss is to pass right shape of weight, bias, input and label.
The shape of weight passed to sampled_softmax is not the the same with the general situation.
For example, logits = xw + b, call sampled_softmax like this:

sampled_softmax(weight=tf.transpose(w), bias=b, inputs=x),
NOT sampled_softmax(weight=w, bias=b, inputs=logits)!!

Besides, label is not one-hot representation. if your labels are one-hot represented, pass labels=tf.reshape(tf.argmax(labels_one_hot, 1), [-1,1])

  • 1
    @user_12 - If you think I have answered your question, please accept and upvote, Thank You. –  May 15 '20 at 15:47