4

Please see python code below, I put comments in the code where I felt emphasis on information is required.

import keras
import numpy

def build_model():
    model = keras.models.Sequential()
    model.add(keras.layers.LSTM(3, input_shape = (3, 1), activation = 'elu'))# Number of LSTM cells in this layer = 3.
    return model

def build_data():
    inputs = [1, 2, 3, 4, 5, 6, 7, 8, 9]
    outputs = [10, 11, 12, 13, 14, 15, 16, 17, 18]
    inputs = numpy.array(inputs)
    outputs = numpy.array(outputs)
    inputs = inputs.reshape(3, 3, 1)# Number of samples = 3, Number of input vectors in each sample  = 3, size of each input vector = 3.
    outputs = outputs.reshape(3, 3)# Number of target samples = 3, Number of outputs per target sample = 3.
    return inputs, outputs

def train():
    model = build_model()
    model.summary()
    model.compile(optimizer= 'adam', loss='mean_absolute_error', metrics=['accuracy'])
    x, y = build_data()
    model.fit(x, y, batch_size = 1, epochs = 4000)
    model.save("LSTM_testModel")

def apply():
    model = keras.models.load_model("LSTM_testModel")
    input = [[[7], [8], [9]]]
    input = numpy.array(input)
    print(model.predict(input))

def main():
    train()

main()

My understanding is that for each input sample there are 3 input vectors. Each input vector goes to an LSTM cell. i.e. For sample 1, input vector 1 goes to LSTM cell 1, input vector 2 goes to LSTM cell 2 and so on.

Looking at tutorials on the internet, I've seen that the number of LSTM cells is much greater than the number of input vectors e.g. 300 LSTM cells.

So say for example I have 3 input vectors per sample what input goes to the 297 remaining LSTM cells?

I tried compiling the model to have 2 LSTM cells and it still accepted the 3 input vectors per sample, although I had to change the target outputs in the training data to accommodate for this(change the dimensions) . So what happened to the third input vector of each sample...is it ignored?

Image taken from: http://karpathy.github.io/2015/05/21/rnn-effectiveness/

I believe the above image shows that each input vector (of an arbitrary scenario) is mapped to a specific RNN cell. I may be misinterpreting it. Above image taken from the following URL: http://karpathy.github.io/2015/05/21/rnn-effectiveness/

rert588
  • 737
  • 1
  • 7
  • 19
  • I think your understanding of an LSTM is not correct, what do you call an input vector and why are there three of them? – Dr. Snoopy Jan 09 '20 at 22:57
  • So I have three input vectors per sample of dimension 1. I have three of them for testing. The above code was just to test LSTMs. – rert588 Jan 09 '20 at 23:00
  • I've been trying to understand LSTMs since around the beginning of year 2019. I've read a lot machine learning and data science articles on using LSTMs, I haven't come close to understanding LSTMs. – rert588 Jan 09 '20 at 23:03
  • I have run the code and I achieved convergence. I just need to get the understanding of the LSTM layer correct. – rert588 Jan 09 '20 at 23:04
  • Ok, the number of cells is unrelated to the shape of the input, it works like a fully connected layer, there is a matrix of size 3x3, the input shape and the output (cells) shape, and they do not have to be equal. There are more matrices inside the LSTM for the gates, which also have dimension equal to the number of cells. – Dr. Snoopy Jan 09 '20 at 23:07
  • Could you post an answer with diagrams describing this please? – rert588 Jan 09 '20 at 23:10
  • I have seen diagrams, like for natural language processing, where they assign each word (one-hot-encoded) to a specific cell of an LSTM layer. Why is this so? – rert588 Jan 09 '20 at 23:11
  • [This](https://stackoverflow.com/questions/58276337/proper-way-to-feed-time-series-data-to-stateful-lstm/58277760#58277760) may be of help. Also, your 'actual' question (how does an LSTM work) is way beyond the scope of a StackOverflow question - but you can take it one part at a time. – OverLordGoldDragon Jan 09 '20 at 23:55
  • Your understanding of the LSTM layer is not correct. Please see this answer for understanding it better: https://stackoverflow.com/questions/38714959/understanding-keras-lstms/50235563#50235563 – Daniel Möller Jan 10 '20 at 01:49
  • @DanielMöller I read the post in the link. I want to get clarification: Is each of the green boxes in the diagram (even the one in my question) an LSTM cell. If not, what do they represent? – rert588 Jan 10 '20 at 09:21
  • They are time steps. Recursive iterations. – Daniel Möller Jan 10 '20 at 12:14

1 Answers1

4

I will try to answer some of your questions and then will consolidate the information provided in the comments for completeness, for the benefit of you as well as for the Community.

As mentioned by Matias in the comments, irrespective of whether the Number of Inputs are more than or less than the Number of Units/Neurons, it will be connected like a Fully Connected Network, as shown below.

LSTM Fully Connected Network

To understand how RNN/LSTM work internally, let's assume we have

Number of Input Features => 3 => F1, F2 and F3

Number of Timesteps => 2 => 0 and 1

Number of Hidden Layers => 1

Number of Neurons in each Hidden Layer => 5

Then what actually happens inside can be represented in the screenshots shown below:

Understanding RNN/LSTM

enter image description here

You have also asked about words being assigned to LSTM Cell. Not sure which link you are referring to and whether it is correct or not but in simple terms (words in this screenshot actually will be replaced by Embedding Vectors), you can understand how LSTM handles the Text as shown in the screenshot below:

enter image description here

For more information, please refer Beautiful Explanation by OverLordGoldDragon and Daniel Moller.

Hope this helps. Happy Learning!

  • Thanks for referring my answer; I'd add one to the formula graphic [here](https://stackoverflow.com/a/60633086/10133797). The key is that each row in the weight matrix is an _independent feature extractor_; it's just a concatenation of differently random-initialized vectors. Each operates the same on a given input, just with different numbers. This feature extraction can be nicely visualized as shown in EX1 [here](https://github.com/OverLordGoldDragon/see-rnn#examples), for `Bidirectional(LSTM(32))` (two `LSTM(32)`, one iterating start-to-end, other in reverse). – OverLordGoldDragon May 28 '20 at 15:16