What is the effect of using TimeDistributed layer wrapper?

Question

Consider the following two models:

from tensorflow.python.keras.layers import Input, GRU, Dense, TimeDistributed
from tensorflow.python.keras.models import Model

inputs = Input(batch_shape=(None, None, 100)) 
gru_out = GRU(32, return_sequences=True)(inputs)
dense = Dense(200, activation='softmax')
decoder_pred = TimeDistributed(dense)(gru_out)
model = Model(inputs=inputs, outputs=decoder_pred)
model.summary()

with the output:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, None, 100)         0         
_________________________________________________________________
gru (GRU)                    (None, None, 32)          12768     
_________________________________________________________________
time_distributed (TimeDistri (None, None, 200)         6600      
=================================================================
Total params: 19,368
Trainable params: 19,368
Non-trainable params: 0
_________________________________________________________________

And the second model:

from tensorflow.python.keras.layers import Input, GRU, Dense
from tensorflow.python.keras.models import Model

inputs = Input(batch_shape=(None, None, 100)) 
gru_out = GRU(32, return_sequences=True)(inputs)
decoder_pred = Dense(200, activation='softmax')(gru_out)
model = Model(inputs=inputs, outputs=decoder_pred)
model.summary()

with the output:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_2 (InputLayer)         (None, None, 100)         0         
_________________________________________________________________
gru_1 (GRU)                  (None, None, 32)          12768     
_________________________________________________________________
dense_1 (Dense)              (None, None, 200)         6600      
=================================================================
Total params: 19,368
Trainable params: 19,368
Non-trainable params: 0
_________________________________________________________________

My question is, is the TimeDistributed layer wrapper doing anything to the first model? Are these two different in any aspect (considering that their total number of params are identical)?

Duplicate of this question? https://stackoverflow.com/questions/44611006/timedistributeddense-vs-dense-in-keras-same-number-of-parameters — Manoj Mohan, Sep 25 '19 at 05:49

What is the effect of using TimeDistributed layer wrapper?

0 Answers0