How to use reset_states(states) function in Keras?

Question

I'm trying to set the LSTM internal state before training each batch. I'm sharing my test code and findings, hoping to find an answer and help others that are addressing similar problems.

In particular, for each data I have a feature X (which doesn't change over time) and a sequence P = p1, p2, p3,... p30. The goal is: given X and p1,p2,p3 predict p4, p5, .. p30.

To this aim, I want to initialize the hidden state of an LSTM with X, as done in several works (e.g., neuraltalk), then the LSTM has to be fit with p1,p2,p3 to predict p4,..,p30. This initialization is needed before each batch (batch_size=1), therefore I need to have the control of the LSTM states initialization. Considerint this question Initializing LSTM hidden state Tensorflow/Keras I've tested the following code:

First of all I've added some prints in the reset_states() function defined in recurrent.py, in order to understand what exactly happens.

def reset_states(self, states=None):
    if not self.stateful:
        raise AttributeError('Layer must be stateful.')
    batch_size = self.input_spec[0].shape[0]

    if not batch_size:
        raise ValueError('If a RNN is stateful, it needs to know '
                         'its batch size. Specify the batch size '
                         'of your input tensors: \n'
                         '- If using a Sequential model, '
                         'specify the batch size by passing '
                         'a `batch_input_shape` '
                         'argument to your first layer.\n'
                         '- If using the functional API, specify '
                         'the time dimension by passing a '
                         '`batch_shape` argument to your Input layer.')
    # initialize state if None
    if self.states[0] is None:
        self.states = [K.zeros((batch_size, self.units))
                       for _ in self.states]
        print "reset states A (all zeros)"  
    elif states is None:
        for state in self.states:
            K.set_value(state, np.zeros((batch_size, self.units)))
        print "reset states B (all zeros)"  

    else:
        if not isinstance(states, (list, tuple)):
            states = [states]
            print "reset states C (list or tuple copying)"  

        if len(states) != len(self.states):
            raise ValueError('Layer ' + self.name + ' expects ' +
                             str(len(self.states)) + ' states, '
                             'but it received ' + str(len(states)) +
                             ' state values. Input received: ' +
                             str(states))
        for index, (value, state) in enumerate(zip(states, self.states)):
            if value.shape != (batch_size, self.units):
                raise ValueError('State ' + str(index) +
                                 ' is incompatible with layer ' +
                                 self.name + ': expected shape=' +
                                 str((batch_size, self.units)) +
                                 ', found shape=' + str(value.shape))
            K.set_value(state, value)
            print "reset states D (set values)"                
            print value
            print "\n"

Here is the test code:

import tensorflow as tf
from keras.layers import LSTM
from keras.layers import Input
from keras.models import Model
import numpy as np
import keras.backend as K

input = Input(batch_shape=(1,3,1))
lstm_layer = LSTM(10,stateful=True)(input)
>>> reset states A (all zeros)

As you can see, the first print is executed when the lstm layer is created

model = Model(input,lstm_layer)
model.compile(optimizer="adam", loss="mse")

with tf.Session() as sess:
    tf.global_variables_initializer().run()
    h = sess.run(model.layers[1].states[0])
    c = sess.run(model.layers[1].states[1])
print h
>>> [[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]]
print c
>>> [[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]]

The internal states have been set to all zeros. As an alternative the function reset_states() can be used

model.layers[1].reset_states()
>>> reset states B (all zeros)

The second message has been printed in this case. Everything seem to work correctly. Now I want to set the states with arbitrary values.

new_h = K.variable(value=np.ones((1, 10)))
new_c = K.variable(value=np.ones((1, 10))+1)

model.layers[1].states[0] = new_h
model.layers[1].states[1] = new_c
with tf.Session() as sess:
     tf.global_variables_initializer().run()
     h = sess.run(model.layers[1].states[0])
     c = sess.run(model.layers[1].states[1])

print h
>>> [[ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.]]
print c
>>> [[ 2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.]]

Ok, I've successfully set both hidden states with my vectors of all one and all two. However, it worth to exploit the class function reset_states() which takes as input the states. This function exploits the function K.set_values(x,values) which expects 'values' to be a numpy array.

new_h_5 = np.zeros((1,10))+5
new_c_24 = np.zeros((1,10))+24
model.layers[1].reset_states([new_h_5,new_c_24])

It seems to work, indeed the output is:

>>> reset states D (set values)
>>> [[ 5.  5.  5.  5.  5.  5.  5.  5.  5.  5.]]
>>> 
>>> 
>>> 
>>> 
>>> reset states D (set values)
>>> [[ 24.  24.  24.  24.  24.  24.  24.  24.  24.  24.]]

However, if i want to check if the states have been initializated I find the previous initialization values (all one, all two).

with tf.Session() as sess:
 tf.global_variables_initializer().run()
 hh = sess.run(model.layers[1].states[0])
 cc = sess.run(model.layers[1].states[1])
print hh
>>> [[ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.]]
print cc
>>> [[ 2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.]]

What is exactly happening here? Why the function seems working according to the prints but doesn't change the values of the internal states?

Marcin Możejko · Accepted Answer · 2017-11-04T13:26:48.893

2

As you may read here, value parameter sets a value by which a variable should be initialized. So when you call tf.global_variables_initializer().run() your states are initialized with values defined here:

new_h = K.variable(value=np.ones((1, 10)))
new_c = K.variable(value=np.ones((1, 10))+1)

Edit:

It seemed obvious for me but once again I will explain why reset_states doesn't work.

Variable definition: When you defined your inner states to be variables initialized by a certain value the n this certain vaklue will be set every time you call variable_initializer.
Reset states: it will update a current value of this variable but it will not change a default value of initializer. In order to do that you need to reassign this states by yet another variable with a given states set as default.

edited Nov 04 '17 at 13:26

answered Oct 31 '17 at 15:31

Marcin Możejko

39,542
10
109
120

1

Ok, but this doesn’t explain what happens when I use the reset_states funcion – user2614596 Nov 02 '17 at 01:14
Mmm.. is still not clear to me how to set the states using the second method through the reset_states funcion – user2614596 Nov 04 '17 at 14:40
What is your main goal here? – Marcin Możejko Nov 04 '17 at 14:42
Ad I explained in the question, I want to reset the states before training each batch with a context vector (i.e. A feature related to the input sequence of that batch) – user2614596 Nov 04 '17 at 14:45
Then you should initialize variables only once - at the beginning and use `reset_states` before each batch. – Marcin Możejko Nov 04 '17 at 18:01
But I need to initialize the states more than one time. For each batch I have an initialization because 1 batch corresponds to one context vector. – user2614596 Nov 04 '17 at 20:01
So use `reset_states` with a given value after each epoch. – Marcin Możejko Nov 04 '17 at 20:09
Indeed in the above code I wanted to check if the values changed after using the function reset_states. So you re saying the problem is related to the binding with the two Variables. – user2614596 Nov 05 '17 at 07:19

How to use reset_states(states) function in Keras?

1 Answers1