0

Here is a very simple example of LSTM in stateless mode and we train it on a very simple sequence [0–>1] and [0–>2]

Any idea why it won’t converge in stateless mode.?

We have a batch of size 2 with 2 samples and it supposed to keep the state within the batch. When predicting we would like to receive successively 1 and 2.

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM 
import numpy
# define sequences
seq = [0, 1, 0, 2]
# convert sequence into required data format. 
#We are going to extract 2 samples [0–>1] and [0–>2] and convert them into one hot vectors
seqX=numpy.array([[( 1. , 0. , 0.)], [( 1. , 0. , 0.)]])
seqY=numpy.array([( 0. , 1. , 0.) , ( 0. , 0. , 1.)])

# define LSTM configuration
n_unique = len(set(seq)) 
n_neurons = 20
n_batch = 2
n_features = n_unique #which is =3
# create LSTM
model = Sequential()
model.add(LSTM(n_neurons, input_shape=( 1, n_features)  ))
model.add(Dense(n_unique, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='Adam')
# train LSTM
model.fit(seqX, seqY, epochs=300, batch_size=n_batch, verbose=2, shuffle=False)
# evaluate LSTM 
print('Sequence')
result = model.predict_classes(seqX, batch_size=n_batch, verbose=0)
for i in range(2):
    print('X=%.1f y=%.1f, yhat=%.1f' % (0, i+1, result[i]))

Example 2 Here I want to clarify a bit what result I want.

Same code example but in stateful mode (stateful=True). It works perfectly. We feed the network 2 times with zeros and get 1 and then 2. But I want to get the same result in stateless mode as it supposed to keep the state within the batch.

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM 
import numpy
# define sequences
seq = [0, 1, 0, 2]
# convert sequences into required data format
seqX=numpy.array([[( 1. , 0. , 0.)], [( 1. , 0. , 0.)]])
seqY=numpy.array([( 0. , 1. , 0.) , ( 0. , 0. , 1.)])

# define LSTM configuration
n_unique = len(set(seq))
n_neurons = 20
n_batch = 1
n_features = n_unique
# create LSTM
model = Sequential()
model.add(LSTM(n_neurons, batch_input_shape=(n_batch, 1, n_features), stateful=True  ))
model.add(Dense(n_unique, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='Adam')
# train LSTM
for epoch in range(300):
    model.fit(seqX, seqY, epochs=1, batch_size=n_batch, verbose=2, shuffle=False)
    model.reset_states()
# evaluate LSTM 
print('Sequence')
result = model.predict_classes(seqX, batch_size=1, verbose=0)
for i in range(2):
    print('X=%.1f y=%.1f, yhat=%.1f' % (0, i+1, result[i]))

As a correct result we should get:

Sequence

X=0.0 y=1.0, yhat=1.0

X=0.0 y=2.0, yhat=2.0

Daniel Möller
  • 84,878
  • 18
  • 192
  • 214
  • You have two equal sequences and expect one different output for each sequence? – Daniel Möller Oct 18 '17 at 15:27
  • Looking at `seqX.shape = (2,1,3)` --> we have two sequences, only one time step per sequence (thus, no sequence), three features per step. – Daniel Möller Oct 18 '17 at 15:34
  • We feed the net 2 times with zeros (2 vectors 1,0,0) , and expect to predict first 1 (vector 0,1,0) and then 2 (vector 0,0,2) . That is [0–>1] and [0–>2]. The is supposed remember this sequence. – user8588846 Oct 18 '17 at 15:42
  • You must feed then one sequence with two steps, not two sequences with one step. `seqX.shape = (1,2,3)`. – Daniel Möller Oct 18 '17 at 15:44

1 Answers1

1

You must feed one sequence with two steps instead of two sequences with one step:

  • One sequence, two steps: seqX.shape = (1,2,3)
  • Two sequences, one step: seqX.shape = (2,1,3)

The input shape is (numberOfSequences, stepsPerSequence, featuresPerStep)

seqX = [[[1,0,0],[1,0,0]]]

If you want to get both steps for y as output, you must use return_sequences=True.

LSTM(n_neurons, input_shape=( 1, n_features), return_sequences=True)

The entire working code:

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM 
import numpy

# define sequences
seq = [0, 1, 0, 2]

# convert sequence into required data format. 
#We are going to extract 2 samples [0–>1] and [0–>2] and convert them into one hot vectors
seqX=numpy.array([[[ 1. , 0. , 0.], [ 1. , 0. , 0.]]])
seqY=numpy.array([[[0. , 1. , 0.] , [ 0. , 0. , 1.]]])
    #shapes are (1,2,3) - 1 sequence, 2 steps, 3 features 

# define LSTM configuration
n_unique = len(set(seq)) 
n_neurons = 20
n_features = n_unique #which is =3
#no need for batch size

# create LSTM
model = Sequential()

model.add(LSTM(n_neurons, input_shape=( 2, n_features),return_sequences=True))
    #the input shape must have two steps    

model.add(Dense(n_unique, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='Adam')

# train LSTM
model.fit(seqX, seqY, epochs=300, verbose=2) 
   #no shuffling and no batch size needed. 

# evaluate LSTM 
print('Sequence')
result = model.predict_classes(seqX, verbose=0)
print(seqX)
print(result) #all steps are predicted in a single array (with return_sequences=True)
Daniel Möller
  • 84,878
  • 18
  • 192
  • 214
  • What about the second example then ? It works fine despite Inputs and outputs have same shape as in first example and no need for "return_sequences" , we just explicitly reset the state after each update. Why can't we do the same in stateless mode? – user8588846 Oct 18 '17 at 16:22
  • The second example is not working. It's printing `X=0.0 y=1.0, yhat=1.0` and `X=0.0 y=2.0, yhat=1.0`. Your result is in `yhat` (always 1). In `y` you're printing `(i+1)`. – Daniel Möller Oct 18 '17 at 16:37
  • You don't need return sequences in stateful layers if you predict each step in a different batch, because every prediction will return a value. --- In non stateful layers you must predict all steps at once. – Daniel Möller Oct 18 '17 at 16:39
  • second example works fine. Just don't forget to set n_batch = 1, model.reset_states(). It results in: Sequence X=0.0 y=1.0, yhat=1.0 X=0.0 y=2.0, yhat=2.0 – user8588846 Oct 18 '17 at 16:44
  • I copied exactly your code (again), and really, it does not work. What is your keras version? – Daniel Möller Oct 18 '17 at 16:47
  • Keras 2.0.8, CNTK backend – user8588846 Oct 18 '17 at 16:58
  • 1
    press 4 times spacebar before model.reset_states() . It doesn't copy correct. It should be on the same level with model.fit – user8588846 Oct 18 '17 at 17:02
  • Ok. Now it works. It's working because you have "two batches" (since you defined `n_batch=1` and have two sequences in the array, it decided to put each sequence in a different batch) -- In stateful layers, each batch is "additional steps". (That's only working because `n_batch = 1` and your batch is bigger than 1) -- If you use `n_batch = 2`, you will have the same behavior as the non stateful layer. Each sample in the batch is a different sequence. – Daniel Möller Oct 18 '17 at 17:08
  • Thanks for clarifying. Each sample in the batch is a different sequence and each batch is "additional steps". Does it mean that we don't get these additional steps within the batch between samples? Then what is the point of stateless mode? – user8588846 Oct 18 '17 at 17:46
  • Your stateless batch must have this shape: `(samples, steps, features)`. This contains everything you need. Actually, it's the contrary. There is very little point in using stateful layers. They're only useful in two cases: 1 - your data is too big and it will blow your memory, you need to divide the sequences in smaller parts; 2 - you want a model that will build a sequence step by step, predicting the next element and using this predicted element as input for the next step. – Daniel Möller Oct 18 '17 at 18:08
  • Each batch is "additional steps" only in stateful layers. /// In normal layers, one batch has "all steps" inside it. – Daniel Möller Oct 18 '17 at 18:09
  • The "stateless" layers are not actually "stateless". They do have states, but the states are reset at the end of each batch, because it considers that every batch has complete sequences, not part of sequences. – Daniel Möller Oct 18 '17 at 18:10
  • Just test the entire code I added to my answer. You'll see all sequence steps in the result. – Daniel Möller Oct 18 '17 at 18:19
  • Thanks. I understood that. We have 1 sequence and 2 timesteps. But is there no connection between sequences is we set 2 sequences (or samples) and 1 timestep in "stateless" layers if the both sequences are within the batch = 2 ? The documentation says : "stateful: Boolean (default False). If True, the last state for each sample at index i in a batch will be used as initial state for the sample of index i in the following batch." – user8588846 Oct 18 '17 at 21:25
  • - which implies that until we reached the end of the batch in stateless layer it should consider both inputs as one sequence before resetting – user8588846 Oct 18 '17 at 21:36
  • There is no connection at all between two sequences in the same batch (even in stateful layers this is true). Each sequence has its own state, states are not shared between sequences (even in stateful layers). Your example 2 worked because albeit you had two sequences, you defined batch size = 1, and keras internally understood two batches, being the second batch a sequel of the first, with only one sequence. – Daniel Möller Oct 19 '17 at 01:25
  • Suppose you have a stateful layer and 3 sequences divided in two batches. If you define the batch size properly as 3, there will be 3 independent states. The first batch will contain 3 independent, not connected, sequences. And the second batch will contain the same 3 sequences continued. – Daniel Möller Oct 19 '17 at 01:27
  • See this: https://stackoverflow.com/questions/43882796/when-does-keras-reset-an-lstm-state – Daniel Möller Oct 19 '17 at 01:36
  • Got it. Thanks for you help . – user8588846 Oct 19 '17 at 02:20