Context:
- I have a recurrent neural network with LSTM cells
- The input to the network is a batch of size (batch_size, number_of_timesteps, one_hot_encoded_class) in my case (128, 300, 38)
- The different rows of the batch (1-128) are not necessarily related to each other
- The target for one time step is given by the value of the next time step.
My questions: When I train the network using an input batch of (128,300,38) and a target batch of the same size,
does the network always consider only the last time-step t to predict the value of the next timestep t+1?
or does it consider all time steps from the beginning of the sequence up to time step t?
or does the LSTM cell internally remember all previous states?
I am confused about the functioning because the network is trained on multiple time steps simulatenously so I am not sure how the LSTM cell can still have knowledge of the previous states.
I hope somebody can help. Thanks in advance!
Code for dicussion:
cells = []
for i in range(self.n_layers):
cell = tf.contrib.rnn.LSTMCell(self.n_hidden)
cells.append(cell)
cell = tf.contrib.rnn.MultiRNNCell(cells)
init_state = cell.zero_state(self.batch_size, tf.float32)
outputs, final_state = tf.nn.dynamic_rnn(
cell, inputs=self.inputs, initial_state=init_state)
self.logits = tf.contrib.layers.linear(outputs, self.num_classes)
softmax_ce = tf.nn.sparse_softmax_cross_entropy_with_logits(
labels=labels, logits=self.logits)
self.loss = tf.reduce_mean(softmax_ce)
self.train_step = tf.train.AdamOptimizer(self.lr).minimize(self.loss)