0

Hey guys I have built an LSTM model that works and now I am trying(unsuccessfully) to add an Embedding layer as a first layer.

This solution didn't work for me. I also read these questions before asking: Keras input explanation: input_shape, units, batch_size, dim, etc, Understanding Keras LSTMs and keras examples.

My input is a one-hot encoding(of ones and zeros) of characters of a language that consists 27 letters. I chose to represent each word as a sequence of 10 characters. Input size for each word is (10,27) and I have 465 of them so it's X_train.shape (465,10,27), I also have a label of size y_train.shape (465,1). My goal is to train a model and while doing that to build a character embeddings.

Now this is the model that compiles and fits.

main_input = Input(shape=(10, 27))
rnn = Bidirectional(LSTM(5))
x = rnn(main_input)
de = Dense(1, activation='sigmoid')(x)
model = Model(inputs = main_input, outputs = de)
model.compile(loss='binary_crossentropy',optimizer='adam')
model.fit(X_train, y_train, epochs=10, batch_size=1, verbose=1)

After adding Embedding layer:

main_input = Input(shape=(10, 27))
emb = Embedding(input_dim=2, output_dim = 10)(main_input)
rnn = Bidirectional(LSTM(5))
x = rnn(emb)
de = Dense(1, activation='sigmoid')(x)
model = Model(inputs = main_input, outputs = de)
model.compile(loss='binary_crossentropy',optimizer='adam')
model.fit(X_train, y_train, epochs=10, batch_size=1, verbose=1)

output: ValueError: Input 0 is incompatible with layer bidirectional_31: expected ndim=3, found ndim=4

How do I fix the output shape? Your ideas would be much appreciated.

Art
  • 91
  • 2
  • 10

1 Answers1

4

My input is a one-hot encoding(of ones and zeros) of characters of a language that consists 27 letters.

You shouldn't pass a one-hot-encoding into an Embedding. Embedding layers map an integer index to an n-dimensional vector. As a result you should pass in the pre-one-hotted indexes directly.

I.e. before you have an one-hotted input like [[0, 1, 0], [1, 0, 0], [0, 0, 1]], which was created from a set of integers like [1, 0, 2]. Instead of passing on the (10, 27) one-hotted vector pass in original vector of (10,).

main_input = Input(shape=(10,)) # only pass in the indexes
emb = Embedding(input_dim=27, output_dim = 10)(main_input) # vocab size is 27
rnn = Bidirectional(LSTM(5))
x = rnn(emb)
de = Dense(1, activation='sigmoid')(x)
model = Model(inputs = main_input, outputs = de)
model.compile(loss='binary_crossentropy',optimizer='adam')
model.fit(X_train, y_train, epochs=10, batch_size=1, verbose=1)
Primusa
  • 13,136
  • 3
  • 33
  • 53
  • The problem is that there is no ‘original vector’ because every character gets a one-hot coding. So for example, a character ‘a’ would have first number ‘1’ in vector and all the others are zeroes. And this is the way it is done with characters in NLP, as far as I know. – Art Mar 04 '19 at 16:55
  • 2
    Yes so you directly do a -> 1, b -> 2, c ->3 instead of the onehot encoding, your array will look like [1, 2, 3] for "abc" instead of the one hotted version. This is repeated and elaborate more upon in my answer. – Primusa Mar 04 '19 at 17:52