0

I want to generate poems based on Robert Frost's Poems. I have preprocessed my dataset:

max_sentence_len = max(len(l) for l in corpus_int)

input_seq = np.array(tf.keras.preprocessing.sequence.pad_sequences(corpus_int,padding = 'pre',truncating = 'pre',maxlen = max_sentence_len))
predictors, label = input_seq[:,:-1],input_seq[:,-1]#predictors everything except last, label only last
label = ku.to_categorical(label, num_classes=total_words,dtype='int32')

predictors

array([[   0,    0,    0, ...,   10,    5,  544],
       [   0,    0,    0, ...,   64,    8,  854],
       [   0,    0,    0, ...,  855,  174,    2],
       ...,
       [   0,    0,    0, ...,  129,   49,   94],
       [   0,    0,    0, ...,  183,  159,   60],
       [   0,    0,    3, ...,    3, 2157,    4]], dtype=int32)

label

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 1]], dtype=int32)

After that i have built my model using encoder - decoder arhcitecture:

class seq2seq(tf.keras.Model):
  def __init__(self,max_sequence_len,total_words):
    super(seq2seq,self).__init__()
    self.max_sequence_len = max_sequence_len
    self.total_words = total_words

    self.input_len = self.max_sequence_len - 1
    self.total_words = self.total_words

    #Encoder
    self.enc_embedding = tf.keras.layers.Embedding(input_dim = total_words,output_dim = 300,input_length = max_sentence_len - 1)
    self.enc_lstm_1 = tf.keras.layers.LSTM(units = 300, activation = 'tanh')
    self.enc_lstm_2 = tf.keras.layers.LSTM(units = 300, activation = 'tanh', return_state = True)

    #decoder
    self.dec_embedding = tf.keras.layers.Embedding(input_dim = total_words,output_dim = 300,input_length = max_sentence_len - 1)
    self.dec_lstm_1 = tf.keras.layers.LSTM(units = 300, activation = 'tanh')
    self.dec_lstm_2 = tf.keras.layers.LSTM(units = 300, activation = 'tanh', return_state = True,return_sequences = True)

    #Dense layer and output:
    self.dense = tf.keras.layers.Dense(total_words, activation='softmax')

  def call(self,inputs):
    #Encoding
    enc_x = self.enc_embedding(inputs)
    enc_x = self.enc_lstm_1(enc_x)
    enc_outputs, state_h, state_c = self.enc_lstm_2(enc_x)

    #Decoding:
    dec_x = self.dec_embedding(enc_outputs)
    dec_x = self.dec_lstm_1(dec_x,initial_state = [state_h, state_c])
    dec_outputs, _, _ = self.enc_lstm_2(dec_x)
    output_dense = self.dense(dec_outputs)

    return output_dense

model = seq2seq(max_sequence_len = max_sentence_len,total_words = total_words)   
model.compile(optimizer = tf.keras.optimizers.RMSprop(lr=0.0001),loss='categorical_crossentropy', metrics=['accuracy']) 
model.fit(predictors,label,epochs=5, batch_size=128)

But at the end I get the following error:

ValueError                                Traceback (most recent call last)
<ipython-input-4-1c349573302d> in <module>()
     37 model = seq2seq(max_sequence_len = max_sentence_len,total_words = total_words)
     38 model.compile(optimizer = tf.keras.optimizers.RMSprop(lr=0.0001),loss='categorical_crossentropy', metrics=['accuracy'])
---> 39 model.fit(predictors,label,epochs=5, batch_size=128)

8 frames
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/impl/api.py in wrapper(*args, **kwargs)
    235       except Exception as e:  # pylint:disable=broad-except
    236         if hasattr(e, 'ag_error_metadata'):
--> 237           raise e.ag_error_metadata.to_exception(e)
    238         else:
    239           raise

ValueError: in converted code:

    <ipython-input-4-1c349573302d>:27 call  *
        enc_outputs, state_h, state_c = self.enc_lstm_2(enc_x)
    /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/layers/recurrent.py:623 __call__
        return super(RNN, self).__call__(inputs, **kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/base_layer.py:812 __call__
        self.name)
    /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/input_spec.py:177 assert_input_compatibility
        str(x.shape.as_list()))

    ValueError: Input 0 of layer lstm_1 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [None, 300]

I understand, that the problem is in the input shape(as it was answered in the post expected ndim=3, found ndim=2).

But I don't know how should I reshape my data for the tensorflow 2.0. Can you help me with the problem?

DY92
  • 437
  • 5
  • 18
  • 1
    It'd help if you shared your full error trace – OverLordGoldDragon Oct 15 '19 at 00:20
  • @OverLordGoldDragon edited. Please take a look. – DY92 Oct 15 '19 at 05:36
  • Error's fine, but are you sure the code you provided is the one you use? I'm surprised it compiles at all - the layers aren't even connected. Using a class-based model definition seems to be an overkill for your problem - regardless, as-is, the issue isn't reproducible. See reference implementation [here](https://stackoverflow.com/questions/58367519/why-do-predictions-differ-for-autoencoder-vs-encoder-decoder/58368618#58368618) – OverLordGoldDragon Oct 15 '19 at 13:59
  • @OverLordGoldDragon yes, i am, sure – DY92 Oct 15 '19 at 16:19
  • I highly doubt it - but to give it a shot, share your _full_ code, maybe it'll change something. – OverLordGoldDragon Oct 15 '19 at 16:38
  • @OverLordGoldDragon corrected, right now layers are connected. – DY92 Oct 15 '19 at 22:30

1 Answers1

1

The problem's rooted in the usage of return_sequences:

  • True --> return output for each timestep fed. LSTM dimension is 20, and input shape is (32, 100, 40), the output shape will be (32, 100, 20) == (batch_size, timesteps, lstm_units)
  • False --> return output for last timestep, computed using all timesteps: (32, 1, 20)

By default, layers will squeeze dimensions w/ size 1 - so return_sequences=False returns a 2D input. Likewise, Dense cannot handle a 3D input unless via TimeDistributed - so pre-dense LSTM should have return_sequences=False. All mentioned changes implemented below - model is able to fit.

class seq2seq(tf.keras.Model):
  def __init__(self,max_sequence_len,total_words):
    super(seq2seq,self).__init__()
    self.max_sequence_len = max_sequence_len
    self.total_words = total_words

    self.input_len = self.max_sequence_len - 1
    self.total_words = self.total_words

    #Encoder
    self.enc_embedding = tf.keras.layers.Embedding(input_dim = total_words,
                         output_dim = 300,input_length = max_sentence_len - 1)
    self.enc_lstm_1 = tf.keras.layers.LSTM(units = 300, activation = 'tanh',
                                           return_sequences=True)
    self.enc_lstm_2 = tf.keras.layers.LSTM(units = 300, activation = 'tanh', 
                                           return_state = True)

    #decoder
    self.dec_embedding = tf.keras.layers.Embedding(input_dim = total_words,
                         output_dim = 300,input_length = max_sentence_len - 1)
    self.dec_lstm_1 = tf.keras.layers.LSTM(units = 300, activation = 'tanh',
                                           return_sequences=True)
    self.dec_lstm_2 = tf.keras.layers.LSTM(units = 300, activation = 'tanh', 
                         return_state = True,return_sequences = False)

    #Dense layer and output:
    self.dense = tf.keras.layers.Dense(total_words, activation='softmax')

  def call(self,inputs):
    #Encoding
    enc_x = self.enc_embedding(inputs)
    enc_x = self.enc_lstm_1(enc_x)
    enc_outputs, state_h, state_c = self.enc_lstm_2(enc_x)

    #Decoding:
    dec_x = self.dec_embedding(enc_outputs)
    dec_x = self.dec_lstm_1(dec_x,initial_state = [state_h, state_c])
    dec_outputs, _, _ = self.enc_lstm_2(dec_x)
    output_dense = self.dense(dec_outputs)

    return output_dense
OverLordGoldDragon
  • 1
  • 9
  • 53
  • 101