0

I built a graph in TensorFlow that is separated into two parts:

  • part 1 takes a list of length N and it turns it into M consecutive windows, which constitute my mini-batch. Minimal example: input [a,b,c,d,e,f] becomes [[a,b,c],[b,c,d],[c,d,e],[d,e,f]]
  • part 2 operates on each window, so [[a,b,c],[b,c,d],[c,d,e],[d,e,f]] becomes [x1,x2,x3,x4].

No problems here.

I would like to turn this graph into a recursive one by having the second part use the output of a previous window to compute its output, like so: part 2 takes [a,b,c] and a default x0 to produce x1, then [[b,c,d], x1] to output x2, then [[c,d,e], x2] to output x3 and so on.

How do I achieve this?

Ziofil
  • 1,815
  • 1
  • 20
  • 30

1 Answers1

0

If you were to treat each 3 letter array as an input step, i.e:

step 1: [abc]
step 2: [bcd]
step 3: [cde]

The hidden state will propagate through each timestep, the hidden state is the same as the output, so you have nothing to worry about.


import tensorflow as tf
import numpy as np

sess = tf.InteractiveSession()

def lstm_cell(hidden_size):
    return tf.contrib.rnn.BasicLSTMCell(num_units = hidden_size)

in_seqlen = 3
input_dim = 3

x = tf.placeholder("float", [None, in_seqlen, input_dim])

out, state = tf.nn.dynamic_rnn(lstm_cell(input_dim), x, dtype=tf.float32)

...

sess.run(tf.global_variables_initializer())
output, states = sess.run([out, state], feed_dict={x:[[[1,2,3],[2,3,4],[3,4,5]]]})

If instead you mean treating each one as a sequence, i.e:

step 1: a,x0
step 2: b,x0
step 3: c,x0
output: x1

step 1: b,x1
step 2: c,x1
step 3: d,x1
output: x2

etc...

Then you need to feed the last state as input to the session each time you run a session:

...

in_seqlen = 3
input_dim = 1
hidden_dim = input_dim

x = tf.placeholder(tf.float32, [None, in_seqlen, input_dim])
s = tf.placeholder(tf.float32, [2, None, hidden_dim])

state_tuple = tf.nn.rnn_cell.LSTMStateTuple(s[0], s[1])
out, state = tf.nn.dynamic_rnn(lstm_cell(hidden_dim), x, initial_state=state_tuple, dtype=tf.float32)

...

sess.run(tf.global_variables_initializer())

batch_size = 1
init_state = np.zeros((2, batch_size, hidden_dim))

output, states = sess.run([out, state], feed_dict={x:[[[1],[2],[3]]], s:init_state})
#feed state of previous run
output, states = sess.run([out, state], feed_dict={x:[[[1],[2],[3]]], s:states})

You'll need to add in target placeholder, loss etc.

useful: TensorFlow: Remember LSTM state for next batch (stateful LSTM) http://colah.github.io/posts/2015-08-Understanding-LSTMs/

C Thomas
  • 36
  • 4
  • I'm not sure I understand. My case is the first one that you mention, i.e. the input is the full array (say) [b,c,d] and the output of the previous step, x1. But where do I stick the graph that computes x2 from [b,c,d] and x1? (that I already have, minus a small modification to accept x1 as well) – Ziofil Oct 08 '17 at 19:21
  • If you look at the structure of an LSTM, or GRU, it's already doing what you're trying to do inside the cell. Feeding the output back into the input is completely unnecessary. I'd really recommend reading the 2nd link I posted to see what I mean. – C Thomas Oct 09 '17 at 12:49
  • Okay, I'm reading it (at the moment my confusion is because I already have a recurrent structure in mind, and I am wondering if and how to make it happen). – Ziofil Oct 09 '17 at 12:58
  • This doesn't solve my problem, or I still don't see how. The 2nd part of the graph that processes (say) [b,c,d] is quite complex (it does convolutions, tensor shape manipulations, etc...). Where do I tell TF to use *that* graph recursively? – Ziofil Oct 09 '17 at 21:13
  • If you are doing that then have a look at raw_rnn, where you define your own custom loop function https://www.tensorflow.org/api_docs/python/tf/nn/raw_rnn. – C Thomas Oct 11 '17 at 11:18