3

I am trying to find the best way to pass the LSTM state between batches. I have searched everything but I could not find a solution for the current implementation. Imagine I have something like:

cells = [rnn.LSTMCell(size) for size in [256,256]
cells = rnn.MultiRNNCell(cells, state_is_tuple=True)
init_state = cells.zero_state(tf.shape(x_hot)[0], dtype=tf.float32)
net, new_state = tf.nn.dynamic_rnn(cells, x_hot, initial_state=init_state ,dtype=tf.float32)

Now I would like to pass the new_state in each batch efficiently, so without storing it back to memory and then re-feed to tf using feed_dict. To be more precise, all the solutions I found use sess.run to evaluate new_state and feed-dict to pass it into init_state. Is there any way to do so without having the bottleneck of using feed-dict?

I think I should use tf.assign in some way but the doc is incomplete and I could not find any workaround.

I want to thank everybody that will ask in advance.

Cheers,

Francesco Saverio

All the others answers that I found on stack overflow works for older version or use the 'feed-dict' method to pass the new state. For instance:

1) TensorFlow: Remember LSTM state for next batch (stateful LSTM) This works by using 'feed-dict' to feed the state placeholder and I want to avoid that

2) Tensorflow - LSTM state reuse within batch This does not work with the state turple

3) Saving LSTM RNN state between runs in Tensorflow Same here

Francesco Z
  • 164
  • 1
  • 12

1 Answers1

4

LSTMStateTuple is nothing more than a tuple of output and hidden state. tf.assign creates an operation that when run, assigns a value stored in a tensor to a variable (if you have specific questions, please ask so that docs can be improved). You can use the solution with tf.assign by retrieving the hidden state tensor using from the tuple using the c attribute of the tuple (assuming you want the hidden state) - new_state.c

Here is a complete self-contained example on a toy problem: https://gist.github.com/iganichev/632b425fed0263d0274ec5b922aa3b2f

iga
  • 3,571
  • 1
  • 12
  • 22
  • I tried but I cannot make it works, could you provide me some code? – Francesco Z Apr 14 '18 at 16:07
  • Thank you. Really, does it also work with `dynamic_rnn` ? – Francesco Z Apr 23 '18 at 14:17
  • I don't see any reason why it should not. – iga Apr 23 '18 at 19:48
  • What if I have different hidden sizes? e.g. one layer of 512 units and one of 256? – Francesco Z May 01 '18 at 12:47
  • If I try your to pass your `initial_state` to `dynamic_rnn` it does not work, can you provide me an example with `dynamic_rnn` ? Also, I am looking for an answer where `feed_dict` is not used! So I only want to assign an operation to update the state everytime the optimiser is called on the loss function. Can you provide me with an example where the inputs are in the form of [BATCH_SIZE, SEQUENCE_LEN, N_CLASSES], so one hot encoded, and the LSTM layers have different sizes? – Francesco Z May 01 '18 at 12:49
  • I don't think I will find time to write these examples soon. You should try yourself. If you face some conceptual issue (or a bug) just ask another question. I used the `feed_dict` in this example just to demonstrate the difference. You can just use `saved_h` and `saved_c` directly in place of `initial_*` placeholders. Then, you won't need to feed them. – iga May 01 '18 at 21:12
  • I tried for ages. But, still, I find nobody able to do that, you will be the first :) As soon as I change your code to `dynamic_rnn` it stops to work – Francesco Z May 02 '18 at 08:20
  • any help? with `dynamic_rnn` your code does not work. I tried to fix it but with no luck – Francesco Z Jun 02 '18 at 15:58
  • 1
    Please ask another question with a small self-contained example you can't get to work. Also, include the full stack trace of the error you are getting and can't fix. If you want, link it here so that I see it. – iga Jun 06 '18 at 23:38
  • Just try to make your code work with `dynamic_rnn`. Can I pm you via email? – Francesco Z Jun 15 '18 at 20:11