Sampled Softmax in Keras Model

Question

Some approaches I have considered:

Inheriting from Model class Sampled softmax in tensorflow keras

Inheriting from Layers class How can I use TensorFlow's sampled softmax loss function in a Keras model?

Of the two approaches the Model approach is cleaner, as the layers approach is a little hacky - it pushes in the target as part of the input and then bye bye multi-output models.

I'd like some help in subclassing the Model class - Specifically: 1) Unlike the first approach - I would like to take in any number of layers as we do in specifying a standard keras model. For example,

class LanguageModel(tf.keras.Model):
    def __init__(self, **kwargs)

2)I am looking to incorporate within the model class the below code -but want to let the Model class recognize that

def call(self, y_true, input):
        """ reshaping of y_true and input to make them fit each other """
        input = tf.reshape(input, (-1,self.hidden_size))
        y_true = tf.reshape(y_true, (-1,1))
      weights = tf.Variable(tf.float64))
      biases = tf.Variable(tf.float64)
      loss = tf.nn.sampled_softmax_loss(
      weights=weights,
      biases=biases,
      labels=labels,
      inputs=inputs,
      ...,
      partition_strategy="div")
      logits = tf.matmul(inputs, tf.transpose(weights))
      logits = tf.nn.bias_add(logits, biases)
       y_predis = tf.nn.softmax_cross_entropy_with_logits_v2(
                                labels=inputs[1],
                                logits=logits)

3 I guess i need some pointers to which sections of the Model class in the functional API should I mess with -knowing I have to write a custom loss function like above. I guess the issue is accessing the weights in the tf.nn.sampledsoftmax function

score 8 · Accepted Answer · answered Jul 07 '19 at 15:56

8

The simplest approach I can come up with is to define a loss that ignores the result of the output layer.

Full Colab here: https://colab.research.google.com/drive/1Rp3EUWnBE1eCcaisUju9TwSTswQfZOkS

The loss function. Note that it assumes that the output layer is a Dense(activation='softmax') and it ignores y_pred. Thus during training / eval where the loss is used the actual output of the Dense layer is a NOP.

The output layer is used when doing predictions.

class SampledSoftmaxLoss(object):
  """ The loss function implements the Dense layer matmul and activation
  when in training mode.
  """
  def __init__(self, model):
    self.model = model
    output_layer = model.layers[-1]
    self.input = output_layer.input
    self.weights = output_layer.weights

  def loss(self, y_true, y_pred, **kwargs):
    labels = tf.argmax(y_true, axis=1)
    labels = tf.expand_dims(labels, -1)
    loss = tf.nn.sampled_softmax_loss(
        weights=self.weights[0],
        biases=self.weights[1],
        labels=labels,
        inputs=self.input,
        num_sampled = 3,
        num_classes = 4,
        partition_strategy = "div",
    )
    return loss

Model:

def make_model():
  inp = Input(shape=(10,))
  h1 = Dense(16, activation='relu')(inp)
  h2 = Dense(4, activation='linear')(h1)
  # output layer and last hidden layer must have the same dims
  out = Dense(4, activation='softmax')(h2)
  model = Model(inp, out)
  loss_calculator = SampledSoftmaxLoss(model)
  model.compile('adam', loss_calculator.loss)
  return model

tf.set_random_seed(42)
model = make_model()
model.summary()

Note that the SampledSoftmaxLoss imposes that the inputs of the last model Layer must have the same dimensions as the number of classes.

answered Jul 07 '19 at 15:56

Pedro Marques

2,642
1
10
10

HI Pedro - thanks for putting together this example. 2 clarifications -1) the weights and biases in nn.sampled softmax are automatically being updated and assigned to the last layer weights - is that correct? 2) By creating a loss object/ loss layer , I will not have access to val_acc measures during fit will I? Thanks for the colab code- will test it out tonight. – pythOnometrist Jul 08 '19 at 22:31
1

1) correct: the loss function is assuming that the last layer is the output layer and using its weights / biases; and ignoring its output since y_pred is not used for loss calculation. 2) You can use any metrics that operate on the output layer such as accuracy; the output layer is still there and will still generate an output if you connect it to a graph node such as an accuracy metric. – Pedro Marques Jul 09 '19 at 07:29
Thanks that i handy. So I simply create a layer that takes the weights from the last layer and computes predictions? e.g. softmax etc? but that wont make it to model.compile right? – pythOnometrist Jul 09 '19 at 16:07
The loss function is not a layer; is controls the part of the graph that computes the loss and starts the backdrop process; You want the model to be well defined so you can use it for inference. The way to understand this loss function is that it is ignoring the output of the output layer (```y_pred```) and recomputing it using the output layer weights and biases using ```sampled_softmax_loss```; this ends up resulting in gradient updates to the output layer anyway but without using the output layer results directly. – Pedro Marques Jul 09 '19 at 16:12
Thanks _ that clarifies it. It is exactly what I was looking for. I did not want a layer as my loss function. Your model objectreceives the weights from the last layer - uses that to update weights through the sampled sft max. But because the weights are still linked to the mode layer, the weights in the model are being updated, and because your out layer is already a softmax, i can add any metrics applicable to categorical responses(since that is your last layer). And it does the same for the validation set. Neat ! – pythOnometrist Jul 09 '19 at 16:18
One more question - why not simply use self.input = output_layer.output - thus avoiding an additional layer? – pythOnometrist Jul 09 '19 at 16:45
My understanding from https://www.tensorflow.org/api_docs/python/tf/nn/sampled_softmax_loss is that ```sampled_softmax_loss``` expects the input before the matmul with weights. What you suggest would yield a different result which is not according to my reading of the man page. In this proposal, the output layer implements the "eval" branch in the man page; while the loss implements the other branch. – Pedro Marques Jul 09 '19 at 16:53
got it! gracias. – pythOnometrist Jul 09 '19 at 18:29
the shared colab throws errors. can we have the updated code, if possible? – n0obcoder Oct 20 '20 at 10:16
You mentioned that the input of last layer must be of dimension equal number of classes. Why is that? My architecture is 180(inputs) - 512 - 256 - 200000(no of classes) - softmax. Do you suggest that I insert another layer of No. Of class after 256 dimensional layer. I am stuck on this problem from last week. I am desperate for help. – lego king Jun 12 '21 at 15:10

Sampled Softmax in Keras Model

1 Answers1

Linked