I want to do sampled softmax loss in tf keras. I defined my own model by subclassing keras Model. In init, I specify the layers I need including the last Dense projection layer. But this Dense layer shouldn't be called in training as I want to do sampled softmax and only to use it's weights and biases. Then I define the loss function like this:
class SampledSoftmax:
def init( self,
num_sampled,
num_classes,
projection,
bias,
hidden_size):
self.weights = tf.transpose(projection)
self.bias = bias
self.num_classes = num_classes
self.num_sampled = num_sampled
self.hidden_size = hidden_size
def call(self, y_true, input):
""" reshaping of y_true and input to make them fit each other """
input = tf.reshape(input, (-1,self.hidden_size))
y_true = tf.reshape(y_true, (-1,1))
return tf.nn.sampled_softmax_loss(
weights=self.weights,
biases=self.bias,
labels=y_true,
inputs=input,
num_sampled=self.num_sampled,
num_classes=self.num_classes,
partition_strategy='div')
It takes in the necessary parameters to initialize and the class call will be the needed sampled softmax loss function. The catch is that to add loss to model compile I need the weights etc of the last Dense. But 1) in training Dense is not included in the model, and 2) even if it does, the Dense layer would only be hooked up with input and thus get its input dimensions etc in call of my custom model. In short, the weights etc won't be available before compiling model. Can anyone offer some help to point me to the right direction?
Now the code that caused it to fail. I first subclassed model as follows:
class LanguageModel(tf.keras.Model):
def __init__(self,
vocal_size=15003,
embedding_size=512
input_len=64)
self.embedding = Embedding(vocal_size, embedding_size,
input_length=input_len)
self.lstm = LSTM(hidden_size, return_sequences=True)
self.dense = Dense(vocal_size, activation='softmax')
def call(self, inputs, training=False):
emb_out = self.embedding(inputs)
lstm_out = self.lstm(embrace_out)
res = self.dense(lstm_out)
if (training)
''' shouldn't use the last dense as we want to do sampling'''
return lstm_out
return res
Then the part to train the model as below
sampled_loss = SampledSoftmax(num_sampled, vocal_size,
model.dense.kernel, model.dense.bias,
hidden_size)
model.compile(optimizer=tf.train.RMSPropOptimizer(lr),
loss=sampled_loss)
It would fail however I play around with it, because model.dense.kernel is not accessible as by the time of compiling the model, dense layer has not been initialized in call method. Error message as below:
Traceback (most recent call last):
File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/wuxinyu/workspace/nlu/lm/main.py", line 72, in <module>
train_main()
File "/home/wuxinyu/workspace/nlu/lm/main.py", line 64, in train_main
train_model.build_lm_model()
File "/home/wuxinyu/workspace/nlu/lm/main.py", line 26, in build_lm_model
self.model.dense.kernel,
AttributeError: 'Dense' object has no attribute 'kernel'
BTW, the loss defined above would work in small test cases like the following.
x = Input(shape=(10,), name='input_x')
emb_out = Embedding(10000,200,input_length=10)(x)
lstm_out = LSTM(200, return_sequences=True)(emb_out)
dense = Dense(10000, activation='sigmoid')
output = dense(lstm_out)
sl = SampledSoftmax(10, 10000, dense.kernel, dense.bias)
model = Model(inputs=x, outputs=lstm_out)
model.compile(optimizer='adam', loss=sl)
model.summary()
model.fit(dataset, epochs=20, steps_per_epoch=5)