6

I tried very hard to search everywhere, but I couldn't find what num_units in TensorFlow actually is. I tried to relate my question to this question, but I couldn't get clear explanation there.


In TensorFlow, when creating an LSTM-based RNN, we use the following command

cell = rnn.BasicLSTMCell(num_units=5, state_is_tuple=True)

As Colah's blog says, this is a basic LSTM cell:

enter image description here

Now, suppose my data is:

idx2char = ['h', 'i', 'e', 'l', 'o']

# Teach hello: hihell -> ihello
x_data = [[0, 1, 0, 2, 3, 3]]   # hihell
x_one_hot = [[[1, 0, 0, 0, 0],   # h 0
              [0, 1, 0, 0, 0],   # i 1
              [1, 0, 0, 0, 0],   # h 0
              [0, 0, 1, 0, 0],   # e 2
              [0, 0, 0, 1, 0],   # l 3
              [0, 0, 0, 1, 0]]]  # l 3

y_data = [[1, 0, 2, 3, 3, 4]]    # ihello

My input is:

x_one_hot = [[[1, 0, 0, 0, 0],   # h 0
              [0, 1, 0, 0, 0],   # i 1
              [1, 0, 0, 0, 0],   # h 0
              [0, 0, 1, 0, 0],   # e 2
              [0, 0, 0, 1, 0],   # l 3
              [0, 0, 0, 1, 0]]]  # l 3

which is of shape [6,5].

In this blog, we have the following picture

enter image description here

As far as I know, the BasicLSTMCell will unroll for t time steps, where t is my number of rows (please, correct me if I am wrong!). For example, in the following figure, the LSTM is unrolled for t = 28 time steps.

enter image description here

In the Colah's blog, it's written

each line carries an entire vector

So, let's see how my [6,5] input matrix will go through this LSTM-based RNN.

enter image description here

If my above diagram is correct, then what exactly is num_units (which we defined in LSTM cell)? Is it a parameter of an LSTM cell?

If num_unit is a parameter of a single LSTM cell, then it should be something like:

enter image description here

enter image description here

If above diagram is correct, then where are those 5 num_units in the following schematic representation of the LSTM cell (according to Colah's blog)?

enter image description here


If you can give your answer with a figure, that would be really helpful! You can edit or create new whiteboard diagram here.

nbro
  • 15,395
  • 32
  • 113
  • 196
Aaditya Ura
  • 12,007
  • 7
  • 50
  • 88

1 Answers1

3

Your understanding is quite correct. However, unfortunately, there is inconsistency between the Tensorflow terminology and the literature. In order to understand, you need to dig through the Tensorflow implementation code.

A cell in the Tensorflow universe is called an LSTM layer in Colah's universe (i.e an unrolled version). That is why you always define a single cell, and not a layer in your Tensorflow architecture. For example,

cell=rnn.BasicLSTMCell(num_units=5,state_is_tuple=True)

Check the code here.

https://github.com/tensorflow/tensorflow/blob/ef96faaf02be54b7eb5945244c881126a4d38761/tensorflow/python/ops/rnn_cell.py#L90

The definition of cell in this package differs from the definition used in the literature. In the literature, cell refers to an object with a single scalar output. The definition in this package refers to a horizontal array of such units.

Therefore, in order to understand num_units in Tensorflow, its best to imagine an unrolled LSTM as below.

enter image description here

In an unrolled version, you have an input X_t which is a tensor. When you specify an input of the shape

[batch_size,time_steps,n_input]

to Tensorflow, it knows how many times to unroll it from your time_steps parameter.

So if you have X_t as a 1D array in TensorFlow, then in the Colahs unrolled version each LSTM cell x_t becomes a scalar value (Please observe the capital case X (vector/array) and small case x(scalar) - Also in Colah's figures)

If you have X_t as a 2D array in the Tensorflow, then in the Colahs unrolled version each LSTM cell x_t becomes a 1D array/vector (as in your case here) and so on.

Now here comes the most important question.

How would Tensorflow know what is the output/hidden dimension ** Z_t/H_t ?

(Please note the difference between H_t and Z_t - I usually prefer to keep them separate as H_t goes back to input (the loop) and Z_t is the output - Not shown in figure)

Would it be same dimension as X_t ?

No.It can be of any different shape. You need to specify it to the Tensorflow. And that is num_units - The Output Size

Check here in the code:

https://github.com/tensorflow/tensorflow/blob/ef96faaf02be54b7eb5945244c881126a4d38761/tensorflow/python/ops/rnn_cell.py#L298-L300

    @property
    def output_size(self):
        return self._num_units

Tensorflow uses the implementation of LSTM cell as defined in Colahs universe from the following paper:

https://arxiv.org/pdf/1409.2329.pdf

user1302884
  • 783
  • 1
  • 8
  • 16