1

Based on the link:

https://www.tensorflow.org/api_docs/python/tf/nn/dynamic_rnn

In the example, it is shown that the "initial state" is defined in the first example and not in the second example. Could anyone please explain what is the purpose of the initial state? What's the difference if I don't set it vs if i set it? Is it only required in a single RNN cell and not in a stacked cell like in the example provided in the link?

I'm currently debugging my RNN model, as it seemed to classify different questions in the same category, which is strange. I suspect that it might have to do with me not setting the initial state of the cell.

Maxxx
  • 3,688
  • 6
  • 28
  • 55

1 Answers1

1

Could anyone please explain what is the purpose of initial state?

As we know that the state matrix is the weights between the hidden neurons in timestep 1 and timestep 2. They join the hidden neurons of both the time steps. Hence they hold temporal data from the layers in previous time steps.

Providing an initially trained state matrix by the initial_state= argument gives the RNN cell a trained memory of its previous activations.

What's the difference if I don't set it vs if I set it?

If we set the initial weights which have been trained on some other model or the previous model, it means that we are restoring the memory of the RNN cell so that it does not have to start from scratch.

In the TF docs, they have initialized the initial_state as zero_state matrix.

If you don't set the initial_state, it will be trained from scratch as other weight matrices do.

Is it only required in a single RNN cell and not in a stacked cell like in the example provided in the link?

I exactly don't know that why haven't they set the initial_state in the Stacked RNN example, but initial_state is required in every type of RNN as it holds the preserves the temporal features across time steps.

Maybe, Stacked RNN was the point of interest in the docs and not the settings of initial_state.

Tip:

In most cases, you will not need to set the initial_state for an RNN. TensorFlow can handle this efficiently for us. In the case of seq2seq RNN, this property may be used.

Your RNN maybe facing some other issue. Your RNN build ups its own memory and doesn't require powerup.

Shubham Panchal
  • 4,061
  • 2
  • 11
  • 36
  • so the reason why my RNN classifies each question, one batch at a time to the same class is not due to not setting the initial state. Interesting. I certainly didn't include dropouts, could that be also a reason for this happening. – Maxxx May 15 '19 at 02:59
  • The setting of `initial_state` is rare in most cases. There will be some other reason and as you mentioned adding Dropouts may help the model to generalise better. – Shubham Panchal May 15 '19 at 03:41
  • @ShubhamPanchal, do you know how to set the initial_state? I am trying to figure that out here: https://stackoverflow.com/questions/61389657/how-to-manually-initialize-a-tf-1-x-lstmcell-and-dynamic-rnn – Joe Apr 23 '20 at 16:50