159

I try to understand LSTMs and how to build them with Keras. I found out, that there are principally the 4 modes to run a RNN (the 4 right ones in the picture)

enter image description here Image source: Andrej Karpathy

Now I wonder how a minimalistic code snippet for each of them would look like in Keras. So something like

model = Sequential()
model.add(LSTM(128, input_shape=(timesteps, data_dim)))
model.add(Dense(1))

for each of the 4 tasks, maybe with a little bit of explanation.

Tautvydas
  • 2,027
  • 3
  • 25
  • 38
Luca Thiede
  • 3,229
  • 4
  • 21
  • 32
  • For the diagram of the one-to-many architecture, the RNN units to the right of the first X input also have inputs which are required. They can typically be set as the outputs (o or y) from the previous unit or a default zero vector – Vass Jan 12 '23 at 16:15

2 Answers2

178

So:

  1. One-to-one: you could use a Dense layer as you are not processing sequences:

    model.add(Dense(output_size, input_shape=input_shape))
    
  2. One-to-many: this option is not supported well as chaining models is not very easy in Keras, so the following version is the easiest one:

    model.add(RepeatVector(number_of_times, input_shape=input_shape))
    model.add(LSTM(output_size, return_sequences=True))
    
  3. Many-to-one: actually, your code snippet is (almost) an example of this approach:

    model = Sequential()
    model.add(LSTM(1, input_shape=(timesteps, data_dim)))
    
  4. Many-to-many: This is the easiest snippet when the length of the input and output matches the number of recurrent steps:

    model = Sequential()
    model.add(LSTM(1, input_shape=(timesteps, data_dim), return_sequences=True))
    
  5. Many-to-many when number of steps differ from input/output length: this is freaky hard in Keras. There are no easy code snippets to code that.

EDIT: Ad 5

In one of my recent applications, we implemented something which might be similar to many-to-many from the 4th image. In case you want to have a network with the following architecture (when an input is longer than the output):

                                        O O O
                                        | | |
                                  O O O O O O
                                  | | | | | | 
                                  O O O O O O

You could achieve this in the following manner:

model = Sequential()
model.add(LSTM(1, input_shape=(timesteps, data_dim), return_sequences=True))
model.add(Lambda(lambda x: x[:, -N:, :])) #Select last N from output

Where N is the number of last steps you want to cover (on image N = 3).

From this point getting to:

                                        O O O
                                        | | |
                                  O O O O O O
                                  | | | 
                                  O O O 

is as simple as artificial padding sequence of length N using e.g. with 0 vectors, in order to adjust it to an appropriate size.

Shaido
  • 27,497
  • 23
  • 70
  • 73
Marcin Możejko
  • 39,542
  • 10
  • 109
  • 120
  • 14
    One clarification: For example for many to one, you use LSTM(1, input_shape=(timesteps, data_dim))) I thought the 1 stands for the number of LSTM cells/hidden nodes, but apperently not How would you code a Many-to-one with lets say, 512 nodes though than? (Because I read something simliar I thought it would be done with model.add(LSTM(512, input_shape=...)) model.add(Dense(1)) what is that used for than?) – Luca Thiede Mar 27 '17 at 13:31
  • 1
    In this case - your code - after correcting a typo should be ok. – Marcin Możejko Mar 27 '17 at 13:34
  • Why do we use the RepeatVector, and not a vector with the first entry 1= 0 and all the other entries = 0 (according to the picture above, the is no Input at all at the later states, and not always the same input, what Repeat Vector would do in my understanding) – Luca Thiede Mar 27 '17 at 14:09
  • 2
    If you think carefully about this picture - it's only a conceptual presentation of an idea of **one-to-many**. All of this hidden units **must** accept something as an input. So - they might accept the same input as well input with the first input equal to `x` and other equal to `0`. But - on the other hand - they might accept the same `x` repeated many times as well. Different approach is to chain models which is hard in `Keras`. The option I provided is the easiest case of **one-to-many** architecture in `Keras`. – Marcin Możejko Mar 27 '17 at 14:15
  • Nice ! Iam thinking about using LSTM N to N in a GAN architecture. I will have a LSTM based generator. I will give this generetor (as used in "Latent variable" in gans) the first half of the time series and this generator will produce the second half of the time series. Then I will combine the two halfs (real and generated) to produce the "fake" input for the gan. Do you think using the poin 4 of you soluction will work ? or, in another words, is this (solution 4) the right way to do this ? – rjpg Nov 19 '18 at 20:54
  • @MarcinMożejko In your 'one-many' scenario how are you connecting the repeatvector with the LSTM layer?How should I set value of ''number_of_times'' in repeatetvector? Doesn't Keras by itself, finds the no. of time-steps required to build the model, and then repeat the input vector that no. of times? – asn Dec 31 '18 at 09:37
  • @MarcinMożejko so you don't need to explicitly tell the model the length of the output sequence? You just use return_sequence=True and it infers the rest? – user1893354 Jan 09 '19 at 20:32
  • in the many-to-many example, there are 2 cases (in the op). One with and one without "offset". how do the models compare in a sample implementation? What is the difference when implementing those? – rst Jun 21 '19 at 09:41
  • How do you do the second Many 2 Many? to you put a mask on the last timesteps? – 3nomis Apr 07 '20 at 17:50
  • How is your `Many-to-one` different from `Many-to-many`? I mean what difference does adding `return_sequences=True` make? – Parth Nov 14 '20 at 23:42
  • Could you help in how to correctly feed the multidimensional data for many to many or autoencoder model? Let's say we have a total data set stored in an array with a shape (45000, 100, 6) = (Nsample, Ntimesteps, Nfeatures) i.e. we have a 45000 samples with 100 time steps and 6 features. – Djordje Savic Jan 12 '22 at 21:06
  • In many to many with unequal input and output can we use number 4 with padding? – alex3465 Jun 15 '22 at 10:08
17

Great Answer by @Marcin Możejko

I would add the following to NR.5 (many to many with different in/out length):

A) as Vanilla LSTM

model = Sequential()
model.add(LSTM(N_BLOCKS, input_shape=(N_INPUTS, N_FEATURES)))
model.add(Dense(N_OUTPUTS))

B) as Encoder-Decoder LSTM

model.add(LSTM(N_BLOCKS, input_shape=(N_INPUTS, N_FEATURES))  
model.add(RepeatVector(N_OUTPUTS))
model.add(LSTM(N_BLOCKS, return_sequences=True))  
model.add(TimeDistributed(Dense(1)))
model.add(Activation('linear')) 
gustavz
  • 2,964
  • 3
  • 25
  • 47
  • 7
    Could you please explain the details of the `B) Encoder-Decoder LSTM` architecture? I'm having issues understanding the roles of "RepeatVector" / "TimeDistributed" steps. – Marsellus Wallace May 01 '20 at 19:35
  • Could you please help in how to correctly feed the multidimensional data for many to many or encoder-decoder model? I'm mostly struggling with shape. Let's say we have a total data set stored in an array with a shape (45000, 100, 6) = (Nsample, Ntimesteps, Nfeatures) i.e. we have a 45000 samples with 100 time steps and 6 features. – Djordje Savic Jan 12 '22 at 21:08