1

I have this Keras code from some youtube video:

from keras.models import Sequential
from keras.layers import Embedding, SimpleRNN

model = Sequential()
model.add(Embedding(10000, 32))
model.add(SimpleRNN(32))
model.summary()

The output of the summary is this:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_1 (Embedding)      (None, None, 32)          320000    
_________________________________________________________________
simple_rnn_1 (SimpleRNN)     (None, 32)                2080      
=================================================================
Total params: 322,080
Trainable params: 322,080
Non-trainable params: 

First I don't understand why the number of params is 2080 in simple RNN. Next I don't get why output shape from the embedding layer is (None, None, 32)

kee
  • 10,969
  • 24
  • 107
  • 168
  • Possible duplicate of [Number of parameters for Keras SimpleRNN](https://stackoverflow.com/questions/50134334/number-of-parameters-for-keras-simplernn) – Markus Jun 13 '19 at 11:41

2 Answers2

2

For calculating the number of params of simpleRNN Number of parameters for Keras SimpleRNN

For your second question, the output shape of embedding layer is (batch_size, input_length, output_dim) since you didn't specifiy the input_length argument (length of input sequences) of embedding layer, it would take the default value which is None (variable).

Also, since RNN blocks run in each time-step, you can add it to a variable time-step layer. However if you want to add Flatten followed by Dense Layers which take the whole previous layer as input, you have to specifiy the input_length in Embedding Layer

meowongac
  • 702
  • 3
  • 12
  • Thanks @meowongac. What I don't get is the difference between batch_size and input_length (aka sequence_length). To me those two seem to be the same (how many words do I want to convert to the dense word vectors?). What is the difference between them? – kee Jun 13 '19 at 23:53
  • I guess here batch_size is like the number of sentences and input_length is the number of words in each sentence? So the embedding layer isn't really processing one word at a time but rather takes a list of words at a time? – kee Jun 14 '19 at 00:03
  • You are right. For your example, batch_size is the number of sentences in a batch, and the input_length is the **max** number of tokens in one sentence. Each token is represented by it's corresponding integer (id), then the embedding layer convert each token vector into a vector of 32 dim in your case. If you want to know how embedding layer in Keras works behind the scenes you may take a look at this: [https://stats.stackexchange.com/questions/270546/how-does-keras-embedding-layer-work] – meowongac Jun 14 '19 at 03:28
  • Also, since you have to fix the number of tokens in each sentence if you specified the input_length argument, you need to pad them (probably with [pad_sequences in Keras](https://keras.io/preprocessing/sequence/#pad_sequences). And if you want your RNN blocks or TimeDistributed blocks... to ignore the padded value, you can use the [masking layer in Keras](https://keras.io/layers/core/#masking) – meowongac Jun 14 '19 at 03:36
1

Each time step in SimpleRNN is the output Embedding. The embedding size is 32. In RNN there are two parameter matrix U and W

S = f(UX + WS) + b

Since X shape is (None, 32), shape of U is 32 and shape of S is 32. At last bias shape is 1.

So in RNN layer the number of parameters is (32+32+1)*32=2080.

王晓晨
  • 336
  • 2
  • 13