Sampled softmax in tensorflow keras

Question

I want to do sampled softmax loss in tf keras. I defined my own model by subclassing keras Model. In init, I specify the layers I need including the last Dense projection layer. But this Dense layer shouldn't be called in training as I want to do sampled softmax and only to use it's weights and biases. Then I define the loss function like this:

class SampledSoftmax:
    def init( self,
              num_sampled,
              num_classes,
              projection,
              bias,
              hidden_size):
        self.weights = tf.transpose(projection)
        self.bias = bias
        self.num_classes = num_classes
        self.num_sampled = num_sampled
        self.hidden_size = hidden_size

    def call(self, y_true, input):
        """ reshaping of y_true and input to make them fit each other """
        input = tf.reshape(input, (-1,self.hidden_size))
        y_true = tf.reshape(y_true, (-1,1))

        return tf.nn.sampled_softmax_loss(
                   weights=self.weights,
                   biases=self.bias,
                   labels=y_true,
                   inputs=input,
                   num_sampled=self.num_sampled,
                   num_classes=self.num_classes,
                   partition_strategy='div')

It takes in the necessary parameters to initialize and the class call will be the needed sampled softmax loss function. The catch is that to add loss to model compile I need the weights etc of the last Dense. But 1) in training Dense is not included in the model, and 2) even if it does, the Dense layer would only be hooked up with input and thus get its input dimensions etc in call of my custom model. In short, the weights etc won't be available before compiling model. Can anyone offer some help to point me to the right direction?

Now the code that caused it to fail. I first subclassed model as follows:

class LanguageModel(tf.keras.Model):
    def __init__(self, 
                 vocal_size=15003, 
                 embedding_size=512
                 input_len=64)
       self.embedding = Embedding(vocal_size, embedding_size, 
                                  input_length=input_len)
       self.lstm = LSTM(hidden_size, return_sequences=True)
       self.dense = Dense(vocal_size, activation='softmax')

   def call(self, inputs, training=False):
       emb_out = self.embedding(inputs)
       lstm_out = self.lstm(embrace_out)
       res = self.dense(lstm_out)
       if (training)
           ''' shouldn't use the last dense as we want to do sampling'''
           return lstm_out
       return res

Then the part to train the model as below

sampled_loss = SampledSoftmax(num_sampled, vocal_size, 
                   model.dense.kernel, model.dense.bias,
                   hidden_size)

model.compile(optimizer=tf.train.RMSPropOptimizer(lr),
              loss=sampled_loss)

It would fail however I play around with it, because model.dense.kernel is not accessible as by the time of compiling the model, dense layer has not been initialized in call method. Error message as below:

Traceback (most recent call last):
  File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/wuxinyu/workspace/nlu/lm/main.py", line 72, in <module>
    train_main()
  File "/home/wuxinyu/workspace/nlu/lm/main.py", line 64, in train_main
    train_model.build_lm_model()
  File "/home/wuxinyu/workspace/nlu/lm/main.py", line 26, in build_lm_model
self.model.dense.kernel,
AttributeError: 'Dense' object has no attribute 'kernel'

BTW, the loss defined above would work in small test cases like the following.

x = Input(shape=(10,), name='input_x')
emb_out = Embedding(10000,200,input_length=10)(x)
lstm_out = LSTM(200, return_sequences=True)(emb_out)

dense = Dense(10000, activation='sigmoid')
output = dense(lstm_out)

sl = SampledSoftmax(10, 10000, dense.kernel, dense.bias)

model = Model(inputs=x, outputs=lstm_out)
model.compile(optimizer='adam', loss=sl)
model.summary()
model.fit(dataset, epochs=20, steps_per_epoch=5)

What are the cases for which your loss function does not work? — rvinas, Oct 10 '18 at 07:38
I want to create a custom model subclassing model. According to tf.keras guide, I specify the layers in init method and hook up them with input into a whole network in call method. The loss function is required when compiling the model. The catch is model compiling happens after init but before call. Which means by then the last Dense layer haven't got its dimensions and therefore the weight matrix is not available. Thus I can't pass SampledSoftmax the right parameters to obtain the loss function before compiling model. It forms a cycle I can't break. — wxy, Oct 10 '18 at 08:50
The dimension of the Dense layer must be known at compilation time (which makes the weights' tensor available). Do you have an example in which this loss produces an error? — rvinas, Oct 10 '18 at 09:05
this may be helpful https://stackoverflow.com/questions/54756625/integrating-sampled-softmax-in-keras-failed — SantoshGupta7, Mar 10 '19 at 02:13

Sampled softmax in tensorflow keras

0 Answers0

Linked