How to properly initialize the hidden state at first time step of a LSTM decoder in Keras

Question

I am currently implementing the attr2seq model as described in this paper by Dong et al. (2018) in Keras and I got completely stuck at initializing the hidden vectors at first time step of the LSTM decoder using the encoded attribute vectors $a$ (last paragraph in section "3.2 Sequence Decoder"). Therefore, I want to find a solution to how to use the encoded attribute vectors to properly initialize the hidden vectors at the vary first time step of the sequence LSTM decoder.

I have been testing out a first approach as suggested here where one makes the LSTM layer to become stateful and then specify the hidden state after the model has been successfully compiled (and make sure that shuffle = False when fitting the model as well). However, one problem with this approach is that you have to specify a specific batch size and if you have a number of data samples that is not divisible by that batch size, the program will exit with an error (i.e. Exception: In a stateful network, you should only pass inputs with a number of samples that can be divided by the batch size.. Also, when fitting the model, then only the LSTM layers are trained whereas the attribute encoders are not considered during training, leading to un-trained attribute vectors. Maybe I have completely misunderstood the usage of them, so please correct me here if I am saying anything wrong.

Another approach I have tested out was to symbolically set a initial hidden state by assigning values to initial_state that is included in call(…) methods across the recurrent layers in Keras. However, I realized from the source code here that one has to provide weights matrices to provide some custom initial state of a LSTM layer and I am unfortunately clueless on how to do this, even though I think this kind of initialization is wanted but for the hidden vectors at first time step (i.e. t=0) rather than weights at the first time step. Maybe one could do this, but I don't since I am not used to Keras that much (or deep learning for that matter).

The codes illustrate roughly how the model should look like more or less. When running the codes in my program, I am getting the error message ValueError: An initial_state was passed that is not compatible with cell.state_size. Received state_spec=[InputSpec(shape=(None, 514), ndim=2)]; however cell.state_size is (514, 514), which made me realize that initial_state required weights matrix rather than arbitrary number of states for each single data point.

x_drug = Input(shape=(onehotted_drugs.shape[1],))
x_cond = Input(shape=(onehotted_conds.shape[1],))
x_rating = Input(shape=(onehotted_ratings.shape[1],))

g_drug = Dense(self.attr_size)(x_drug)
g_cond = Dense(self.attr_size)(x_cond)
g_rating = Dense(self.attr_size)(x_rating)
g_concatenated = Concatenate()([g_drug, g_cond, g_rating])
a = Dense(self.num_layers * self.hidden_size, activation = "tanh")(g_concatenated)
hidden_vectors = [Lambda(lambda l: slice(l, (0, self.hidden_size*i), 
                         (-1, self.hidden_size*(i+1))))(a) for i in range(self.num_layers)]

x_prev_words = Input(shape = (self.num_tokens,))
ground_zero = Dense(self.hidden_size, use_bias = False)(x_prev_words)
lstm_layers = [LSTM(self.hidden_size, input_shape = 
                    (self.max_sequence_length - 1, self.hidden_size), return_sequences = True)(
               RepeatVector(self.max_sequence_length - 1)(ground_zero), initial_state = hidden_vectors[0])]
for i in range(1, self.num_layers):
   lstm_layers.append(LSTM(self.hidden_size, return_sequences = False if i==self.num_layers-1 else True)(
   lstm_layers[-1], initial_state = hidden_vectors[i]))

next_word_dist = Dense(self.num_tokens, activation = "softmax")(lstm_layers[-1])    
self.dong = Model(inputs=[x_drug, x_cond, x_rating, x_prev_words], outputs=[next_word_dist])
self.dong.compile(loss = "categorical_crossentropy", optimizer = "rmsprop")

If you would require any further information or more codes for inspection, then feel free to tell me. Thank you all for all help I can get with this matter!

Is help still needed? – OverLordGoldDragon Nov 02 '19 at 23:53 — OverLordGoldDragon, Nov 02 '19 at 23:53
@OverLordGoldDragon yes please! – shivam13juna Apr 13 '20 at 07:52 — shivam13juna, Apr 13 '20 at 07:52

How to properly initialize the hidden state at first time step of a LSTM decoder in Keras

0 Answers0