Feeding initial states to LSTMCell

Question

I'm referencing the code here https://github.com/martin-gorner/tensorflow-rnn-shakespeare/blob/master/rnn_train.py and am trying to convert the cell from GRUCell to LSTMCell. Here is an excerpt from the code.

# input state
Hin = tf.placeholder(tf.float32, [None, INTERNALSIZE * NLAYERS], name='Hin')  # [ BATCHSIZE, INTERNALSIZE * NLAYERS]

# using a NLAYERS=3 layers of GRU cells, unrolled SEQLEN=30 times
# dynamic_rnn infers SEQLEN from the size of the inputs Xo

# How to properly apply dropout in RNNs: see README.md
cells = [rnn.GRUCell(INTERNALSIZE) for _ in range(NLAYERS)]

# "naive dropout" implementation
dropcells = [rnn.DropoutWrapper(cell, input_keep_prob=pkeep) for cell in cells]
multicell = rnn.MultiRNNCell(dropcells, state_is_tuple=False)
multicell = rnn.DropoutWrapper(multicell, output_keep_prob=pkeep)  # dropout for the softmax layer

Yr, H = tf.nn.dynamic_rnn(multicell, Xo, dtype=tf.float32, initial_state=Hin)
# Yr: [ BATCHSIZE, SEQLEN, INTERNALSIZE ]
# H:  [ BATCHSIZE, INTERNALSIZE*NLAYERS ] # this is the last state in the sequence

H = tf.identity(H, name='H')  # just to give it a name

I understand that LSTMCell has two states, the cell state C and the output state H. What I want to do is to feed the initial_state with a tuple of both states. How can I do so in the proper way? I have tried various methods but always meet with a tensorflow error.

EDIT: This is one of the attempts:

# inputs
X = tf.placeholder(tf.uint8, [None, None], name='X')  # [ BATCHSIZE, SEQLEN ]
Xo = tf.one_hot(X, ALPHASIZE, 1.0, 0.0)  # [ BATCHSIZE, SEQLEN, ALPHASIZE ]
# expected outputs = same sequence shifted by 1 since we are trying to predict the next character
Y_ = tf.placeholder(tf.uint8, [None, None], name='Y_')  # [ BATCHSIZE, SEQLEN ]
Yo_ = tf.one_hot(Y_, ALPHASIZE, 1.0, 0.0)  # [ BATCHSIZE, SEQLEN, ALPHASIZE ]
# input state
Hin = tf.placeholder(tf.float32, [None, INTERNALSIZE * NLAYERS], name='Hin')  # [ BATCHSIZE, INTERNALSIZE * NLAYERS]
Cin = tf.placeholder(tf.float32, [None, INTERNALSIZE * NLAYERS], name='Cin')
initial_state = tf.nn.rnn_cell.LSTMStateTuple(Cin, Hin)
# using a NLAYERS=3 layers of GRU cells, unrolled SEQLEN=30 times
# dynamic_rnn infers SEQLEN from the size of the inputs Xo

# How to properly apply dropout in RNNs: see README.md
cells = [rnn.LSTMCell(INTERNALSIZE) for _ in range(NLAYERS)]

# "naive dropout" implementation
dropcells = [rnn.DropoutWrapper(cell, input_keep_prob=pkeep) for cell in cells]
multicell = rnn.MultiRNNCell(dropcells, state_is_tuple=True)
multicell = rnn.DropoutWrapper(multicell, output_keep_prob=pkeep)  # dropout for the softmax layer

Yr, H = tf.nn.dynamic_rnn(multicell, Xo, dtype=tf.float32, initial_state=initial_state)

It says "TypeError: 'Tensor' object is not iterable."

Thanks.

well, show us what you've tried. The states parameter of LSTM cells take a tuple (LSTMStateTuple) of Tensors — parsethis, Apr 02 '18 at 00:00

score 2 · Accepted Answer · answered Apr 02 '18 at 02:37

The error is happening because w you have to provide a tuple (of placeholders) for each one of the layers separately when building the graph, then when you're training you must provide the state for the first layer.

The errors is saying: I need to iterate over the a list of a tuple of (c's and m's) because you have multiple cells and I need to initialize all of their states but all I see is a Tensor and I can't iterate over that.

This snippet shows how to setup the placeholders when building the graph:

state_size = 10
num_layers = 3

X = tf.placeholder(tf.float32, [None, 100, 10])

# the second dimension is size 2 and represents
# c, m ( the cell and hidden state ) 
# set the batch_size to None
state_placeholder = tf.placeholder(tf.float32, [num_layers, 2, 
                                    None, state_size])
# l is number of layers placeholders 
l = tf.unstack(state_placeholder, axis=0)

then we create a tuple of LSTMStateTuple for each layer
rnn_tuple_state = tuple(
         [rnn.LSTMStateTuple(l[idx][0],l[idx][1])
          for idx in range(num_layers)]
)

# I had to set resuse = True here : tf.__version__ 1.7.0
cells  = [rnn.LSTMCell(10, reuse=True)] * num_layers
mc = rnn.MultiRNNCell(cells, state_is_tuple=True)

outputs, state = tf.nn.dynamic_rnn(cell=mc,
                                   inputs=X,
                                   initial_state=rnn_tuple_state,
                                   dtype=tf.float32)

Here is the relevant bit from the docs:

initial_state: (optional) An initial state for the RNN. If cell.state_size is an integer, this must be a Tensor of appropriate type and shape [batch_size, cell.state_size].

So we ended creating a tuple of placeholders for each cell (layer) with the requisite size. (batch_size, state_size) where batch_size = None. I expounded on this answer

thanks @orsonady, I understand your solution. I noticed that the difference in both code is that in yours the layer is another dimension, while in the original code, the layer is absorbed into the state dimension, why did he do that? ie. Hin = tf.placeholder(tf.float32, [None, INTERNALSIZE * NLAYERS], name='Hin') #[ BATCHSIZE, INTERNALSIZE * NLAYERS]. Sorry about the delay in replying, was side-tracked by other stuff. — lppier, Apr 04 '18 at 15:05
Without the extra dimension layers, I wouldn't be able to unstack along that layers like you did. — lppier, Apr 04 '18 at 15:16

Feeding initial states to LSTMCell

1 Answers1