0

I am currently working on a system that classifies whether two sentences share the same content or not. For this purpose I use pretrained word vectors, so there is an array with the word vectors of sentence one (s1) and an array with the word vectors of sentence 2 (s2). In order to classify whether they are similar or not I create a matrix by comparing all vectors in s1 pairwise with the vectors in s2. This matrix is then fed into a CNN classifier and trained on the data. This all pretty straight forward.

Now I would like to enhance this system by making using bidirectional LSTMs on s1 and s2. The bidirectional LSTM should be used in order to get the hidden state of each vector in s1 and s2 and these hidden states should then be compared in the same way by pairwise cosine similarity as the vectors of s1 and s2 are compared before. This is in order to capture the information of the sentence context of each word in s1 and s2.

Now the question is how to do this in Keras. Currently I am using numpy/sklearn to create the matrices which are then fed as training data into Keras. I found one implementation of what I want to do in plain tensorflow (https://github.com/LiuHuiwen/Pairwise-Word-Interaction-Modeling-by-Tensorflow-1.0/blob/master/model.py).

I assume that I will have to change the input data to consist of just the two arrays of vectors of s1 and s2. Then I have to run the biLSTM first, get the hidden states, convert everything into matrices and feed this into the CNN. The example in plain tensorflow seems to be quite clear to me, but I cannot come up with an idea of how to do this in Keras. Is it possible at all in Keras or does one have to resort to tensorflow directly in order to do the necessary calculations on the output of the biLSTM?

Sebastian_學生
  • 345
  • 1
  • 11

1 Answers1

1

Keras RNN layer including LSTM can return not only the last output in the output sequence but also the full sequence from all hidden layers using return_sequences=True option.

https://keras.io/layers/recurrent/

When you want to connect Bi-Directional LSTM layer before CNN layer, the following code is an example:

from keras.layers import Input, LSTM, Bidirectional, Conv1D

input = Input(shape=(50, 200))
seq = Bidirectional(LSTM(16, return_sequences=True))(input)
cnn = Conv1D(32, 3, padding="same", activation="relu")(seq)

Please note: If you want to use Conv2D layer after Bi-Directional LSTM layer, reshaping to ndim=4 is required for input of Conv2D, like the following code:

from keras.layers import Input, LSTM, Bidirectional, Conv2D, Reshape

input = Input(shape=(50, 200))
seq = Bidirectional(LSTM(16, return_sequences=True))(input)
seq = Reshape((50, 32, 1))(seq)
cnn = Conv2D(32, (3, 3), padding="same", activation="relu")(seq)
t26wtnb
  • 19
  • 2
  • I see this point and thank you very much for the answer and the clarification. The CNN I am using at the moment is seeing the values of the pairwise cosine similarities of the input vectors of s1 and s2 as input. these are generated in advance. What I want to do is also generate the cosine similarity pairwise for the hidden states of the vectors in s1 and s2. I think this is a slightly different case then what you described (which seems to be just a CNN on top of a biLSTM which is also perfectly fine but serves a slightly different purpose I guess). Or am I getting something wrong? – Sebastian_學生 Oct 11 '18 at 05:55
  • I'm sorry for lack of my understanding. I guess that Keras `Lambda` layer may help you. https://keras.io/layers/core/#lambda You can operate any calculations using this `Lambda` layer. I have found similar topic in Stack Overflow: https://stackoverflow.com/questions/51003027/computing-cosine-similarity-between-two-tensors-in-keras – t26wtnb Oct 11 '18 at 06:45