What is actually num_unit in LSTM cell circuit?

Question

I tried very hard to search everywhere, but I couldn't find what num_units in TensorFlow actually is. I tried to relate my question to this question, but I couldn't get clear explanation there.

In TensorFlow, when creating an LSTM-based RNN, we use the following command

cell = rnn.BasicLSTMCell(num_units=5, state_is_tuple=True)

As Colah's blog says, this is a basic LSTM cell:

Now, suppose my data is:

idx2char = ['h', 'i', 'e', 'l', 'o']

# Teach hello: hihell -> ihello
x_data = [[0, 1, 0, 2, 3, 3]]   # hihell
x_one_hot = [[[1, 0, 0, 0, 0],   # h 0
              [0, 1, 0, 0, 0],   # i 1
              [1, 0, 0, 0, 0],   # h 0
              [0, 0, 1, 0, 0],   # e 2
              [0, 0, 0, 1, 0],   # l 3
              [0, 0, 0, 1, 0]]]  # l 3

y_data = [[1, 0, 2, 3, 3, 4]]    # ihello

My input is:

x_one_hot = [[[1, 0, 0, 0, 0],   # h 0
              [0, 1, 0, 0, 0],   # i 1
              [1, 0, 0, 0, 0],   # h 0
              [0, 0, 1, 0, 0],   # e 2
              [0, 0, 0, 1, 0],   # l 3
              [0, 0, 0, 1, 0]]]  # l 3

which is of shape [6,5].

In this blog, we have the following picture

As far as I know, the BasicLSTMCell will unroll for t time steps, where t is my number of rows (please, correct me if I am wrong!). For example, in the following figure, the LSTM is unrolled for t = 28 time steps.

In the Colah's blog, it's written

each line carries an entire vector

So, let's see how my [6,5] input matrix will go through this LSTM-based RNN.

If my above diagram is correct, then what exactly is num_units (which we defined in LSTM cell)? Is it a parameter of an LSTM cell?

If num_unit is a parameter of a single LSTM cell, then it should be something like:

If above diagram is correct, then where are those 5 num_units in the following schematic representation of the LSTM cell (according to Colah's blog)?

If you can give your answer with a figure, that would be really helpful! You can edit or create new whiteboard diagram here.

user1302884 · Accepted Answer · 2018-03-15T22:11:17.797

Your understanding is quite correct. However, unfortunately, there is inconsistency between the Tensorflow terminology and the literature. In order to understand, you need to dig through the Tensorflow implementation code.

A cell in the Tensorflow universe is called an LSTM layer in Colah's universe (i.e an unrolled version). That is why you always define a single cell, and not a layer in your Tensorflow architecture. For example,

cell=rnn.BasicLSTMCell(num_units=5,state_is_tuple=True)

Check the code here.

https://github.com/tensorflow/tensorflow/blob/ef96faaf02be54b7eb5945244c881126a4d38761/tensorflow/python/ops/rnn_cell.py#L90

The definition of cell in this package differs from the definition used in the literature. In the literature, cell refers to an object with a single scalar output. The definition in this package refers to a horizontal array of such units.

Therefore, in order to understand num_units in Tensorflow, its best to imagine an unrolled LSTM as below.

In an unrolled version, you have an input X_t which is a tensor. When you specify an input of the shape

[batch_size,time_steps,n_input]

to Tensorflow, it knows how many times to unroll it from your time_steps parameter.

So if you have X_t as a 1D array in TensorFlow, then in the Colahs unrolled version each LSTM cell x_t becomes a scalar value (Please observe the capital case X (vector/array) and small case x(scalar) - Also in Colah's figures)

If you have X_t as a 2D array in the Tensorflow, then in the Colahs unrolled version each LSTM cell x_t becomes a 1D array/vector (as in your case here) and so on.

Now here comes the most important question.

How would Tensorflow know what is the output/hidden dimension ** Z_t/H_t ?

(Please note the difference between H_t and Z_t - I usually prefer to keep them separate as H_t goes back to input (the loop) and Z_t is the output - Not shown in figure)

Would it be same dimension as X_t ?

No.It can be of any different shape. You need to specify it to the Tensorflow. And that is num_units - The Output Size

Check here in the code:

https://github.com/tensorflow/tensorflow/blob/ef96faaf02be54b7eb5945244c881126a4d38761/tensorflow/python/ops/rnn_cell.py#L298-L300

    @property
    def output_size(self):
        return self._num_units

Tensorflow uses the implementation of LSTM cell as defined in Colahs universe from the following paper:

https://arxiv.org/pdf/1409.2329.pdf

What is actually num_unit in LSTM cell circuit?

1 Answers1