Understanding GRU Architecture - Keras

Question

I am using the Mycroft AI wake word detection and I am trying to understand the dimensions of the network. The following lines show the model in Keras:

model = Sequential()
model.add(GRU(
        params.recurrent_units, activation='linear',
        input_shape=(pr.n_features, pr.feature_size), dropout=params.dropout, name='net'))
model.add(Dense(1, activation='sigmoid'))

My features have a size of 29*13. The GRU layer has 20 units. My question is now, how can my model have 2040 learnable parameters in the GRU layer? How are the units connected? Maybe my overall understanding of a GRU network is wrong, but I can only find explanations of a single cell, and never of the full network. Is the GRU network fully connected? Thank You!

Network Architecture

sebrockm · Accepted Answer · 2019-03-20T14:23:57.963

First of all, for RNNs in general, the time dimension can be arbitrary. For your case that means that the number 29 plays no role. The number 2040 is entirely made up from the numbers 13 (features) and 20 (units in GRU).

To understand, where the number comes from, have a look at this picture from wikipedia:

This is how the basic GRU cell looks like. To understand the variable's dimensions, have a look at this formula from the same wikipedia article:

To make sense out of this you only need to know that the input vectors x[t] have dimension 13 in your case and that the inner states and outputs h[t], r[t], z[t], y[t] have dimension 20. As you can see, there are several places where the dimension parameter is used. So, I personally don't like to use the term "units" for it because it suggests that there are 20 "things" inside. In fact, it's just the dimension of the inner states, matrices, and biases:

With this knowledge, and also knowing that the dimensions in the formula must match, you can derive that the W matrices must have dimension 20 x 13 and the U matices must have dimension 20 x 20. The biases b must have dimension 20.

Then the overall number of parameters calculates as

#Params = 3 * dim(W)
        + 3 * dim(U)
        + 3 * dim(b)
        = 3 * 20*13 + 3 * 20*20 + 3 * 20
        = 2040

To get a deeper understanding of how RNNs work in Keras I highly recommend the answers of this question. It says it's about LSTMs, but everything said there applies for GRUs as well.

Understanding GRU Architecture - Keras

1 Answers1