6

I created a Seq2Seq model for text summarization. I have two models, one with attention and one without. The one without attention was able to generate predictions but I can't do it for the one with attention even though it fits successfully.

This is my model:

latent_dim = 300
embedding_dim = 200

clear_session()

# Encoder
encoder_inputs = Input(shape=(max_text_len, ))

# Embedding layer
enc_emb = Embedding(x_voc, embedding_dim,
                    trainable=True)(encoder_inputs)

# Encoder LSTM 1
encoder_lstm1 = Bidirectional(LSTM(latent_dim, return_sequences=True,
                     return_state=True, dropout=0.4,
                     recurrent_dropout=0.4))
(encoder_output1, forward_h1, forward_c1, backward_h1, backward_c1) = encoder_lstm1(enc_emb)

# Encoder LSTM 2
encoder_lstm2 = Bidirectional(LSTM(latent_dim, return_sequences=True,
                     return_state=True, dropout=0.4,
                     recurrent_dropout=0.4))
(encoder_output2, forward_h2, forward_c2, backward_h2, backward_c2) = encoder_lstm2(encoder_output1)

# Encoder LSTM 3
encoder_lstm3 = Bidirectional(LSTM(latent_dim, return_state=True,
                     return_sequences=True, dropout=0.4,
                     recurrent_dropout=0.4))
(encoder_outputs, forward_h, forward_c, backward_h, backward_c) = encoder_lstm3(encoder_output2)

state_h = Concatenate()([forward_h, backward_h])
state_c = Concatenate()([forward_c, backward_c])

# Set up the decoder, using encoder_states as the initial state
decoder_inputs = Input(shape=(None, ))

# Embedding layer
dec_emb_layer = Embedding(y_voc, embedding_dim, trainable=True)
dec_emb = dec_emb_layer(decoder_inputs)


# Decoder LSTM
decoder_lstm = LSTM(latent_dim*2, return_sequences=True,
                    return_state=True, dropout=0.4,
                    recurrent_dropout=0.2)
(decoder_outputs, decoder_fwd_state, decoder_back_state) = \
    decoder_lstm(dec_emb, initial_state=[state_h, state_c])

#start Attention part
attention = dot([decoder_outputs, encoder_outputs], axes=[2, 2])
attention = Activation('softmax')(attention)
context = dot([attention, encoder_outputs], axes=[2,1])
decoder_outputs = Concatenate()([context, decoder_outputs])
#end Attention

# Dense layer
decoder_dense = TimeDistributed(Dense(y_voc, activation='softmax'))(decoder_outputs)

# Define the model
model = Model([encoder_inputs, decoder_inputs], decoder_dense)

This is how construct the encoder and decoder for generating predictions:

model = load_model("model_intro.h5")

encoder_inputs = model.input[0]  # input_1

encoder_outputs, forward_h, forward_c, backward_h, backward_c = model.layers[5].output #Bi-lstm2

state_h_enc = Concatenate()([forward_h, backward_h])
state_c_enc = Concatenate()([forward_c, backward_c])

encoder_states = [state_h_enc, state_c_enc]
encoder_model = Model(encoder_inputs, encoder_states)

decoder_inputs = model.input[1]  # input_2
decoder_state_input_h = Input(shape=(latent_dim*2,), name="input_3")
decoder_state_input_c = Input(shape=(latent_dim*2,), name="input_4")
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_emdedding = model.layers[6](decoder_inputs)
decoder_lstm = model.layers[9]
decoder_outputs, state_h_dec, state_c_dec = decoder_lstm(decoder_emdedding, initial_state=decoder_states_inputs)
decoder_states = [state_h_dec, state_c_dec]

#start Attention
attention = dot([decoder_outputs, encoder_outputs], axes=[2, 2])
attention2 = Activation('softmax')(attention)
context = dot([attention2, encoder_outputs], axes=[2,1])
decoder_outputs = Concatenate(axis=-1)([context, decoder_outputs])
#end Attention

decoder_dense = model.layers[-1]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model(
    [decoder_inputs] + decoder_states_inputs, [decoder_outputs] + decoder_states
)

In the code if I remove the attention part, it works fine. In the code I have added the comment for the start and the end of attention. The model with attention also fits successfully, however, while constructing the encoder and decoder for generating predictions, I get:

ValueError: Graph disconnected: cannot obtain value for tensor KerasTensor(type_spec=TensorSpec(shape=(None, 300), dtype=tf.float32, name='input_1'), name='input_1', description="created by layer 'input_1'") at layer "embedding". The following previous layers were accessed without issue: []

This is how my model looks like: model

BlueMango
  • 463
  • 7
  • 21
  • 1
    How does the behavior of the model change at test time? The issue is that, at test time, the tensor `encoder_inputs` (also `decoder_inputs`) does not belong to the computational graph of `decoder_model`. To overcome this issue, you should define an input placeholder for these input tensors (e.g. `encoder_inputs = Input(...)`) to define a standalone model. – rvinas Jul 22 '21 at 17:37
  • Thank you for the reply. I changed `encoder_inputs` from `model.input[0]` to `Input(shape=(max_text_len, ))`, same for `decoder_inputs`. I still get the same error. Plus, it works as it is for the model with attention. Do you have an idea, why? – BlueMango Jul 23 '21 at 11:04
  • It would be great to have a minimal reproducible example to study the problem in more detail. The other issue that I see is that `model.layers[5].output` is defined as a function of tensors from another computational graph, so `decoder_model` is still not standalone. The tensors `encoder_outputs, ..., backward_c` should also be fed to `decoder_model` via an input placeholder. – rvinas Jul 23 '21 at 13:40
  • I am replacing all `model.layers[]` part in inference model by their respective value in original training model. For example, `encoder_outputs, ..., backward_c` takes the value from original model. I STILL get the same error. I am really confused. I am not sure how to provide a MRE but my code for the inference model is from here: https://keras.io/examples/nlp/lstm_seq2seq/. I have used multi-layer Bi-lstm instead of single lstm and added an attention to it. – BlueMango Jul 24 '21 at 08:30

1 Answers1

0

I have similar issue. My model has trained just fine but while setting up inference for seq2seq architecture, I am not able to understand why Embedding layer is causing issues. Here is the model architecture:

Code :

from keras.models import load_model, Model
from keras.layers import Input, Embedding

LATENT_DIM = 128
EMBEDDING_DIM = 200
num_words_output = 2000

model = load_model('s2s')
model.summary()

encoder_inputs = model.input[0]  # input_1
encoder_outputs, state_h_enc, state_c_enc = model.layers[4].output  # lstm_1
encoder_states = [state_h_enc, state_c_enc]
encoder_model = Model(encoder_inputs, encoder_states)


decoder_inputs = model.input[1]  # input_2
decoder_state_input_h = Input(shape=(LATENT_DIM,))
decoder_state_input_c = Input(shape=(LATENT_DIM,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]

decoder_lstm = model.layers[5]
decoder_outputs, state_h_dec, state_c_dec = model.layers[5].output # lstm_2
decoder_states = [state_h_dec, state_c_dec]
decoder_dense = model.layers[6]
decoder_outputs = decoder_dense(decoder_outputs)

decoder_model = Model([decoder_inputs] + decoder_states_inputs, [decoder_outputs] + decoder_states)

Error:

line 29: decoder_model = Model([decoder_inputs] + decoder_states_inputs, [decoder_outputs] + decoder_states)

ValueError: Graph disconnected: cannot obtain value for tensor KerasTensor(type_spec=TensorSpec(shape=(None, 16), dtype=tf.float32, name='input_1'), name='input_1', description="created by layer 'input_1'") at layer "embedding". The following previous layers were accessed without issue: []

Model Architecture image

John Doe
  • 1
  • 1
  • 1
    As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jul 01 '23 at 18:22