Consider the following two models:
from tensorflow.python.keras.layers import Input, GRU, Dense, TimeDistributed
from tensorflow.python.keras.models import Model
inputs = Input(batch_shape=(None, None, 100))
gru_out = GRU(32, return_sequences=True)(inputs)
dense = Dense(200, activation='softmax')
decoder_pred = TimeDistributed(dense)(gru_out)
model = Model(inputs=inputs, outputs=decoder_pred)
model.summary()
with the output:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, None, 100) 0
_________________________________________________________________
gru (GRU) (None, None, 32) 12768
_________________________________________________________________
time_distributed (TimeDistri (None, None, 200) 6600
=================================================================
Total params: 19,368
Trainable params: 19,368
Non-trainable params: 0
_________________________________________________________________
And the second model:
from tensorflow.python.keras.layers import Input, GRU, Dense
from tensorflow.python.keras.models import Model
inputs = Input(batch_shape=(None, None, 100))
gru_out = GRU(32, return_sequences=True)(inputs)
decoder_pred = Dense(200, activation='softmax')(gru_out)
model = Model(inputs=inputs, outputs=decoder_pred)
model.summary()
with the output:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) (None, None, 100) 0
_________________________________________________________________
gru_1 (GRU) (None, None, 32) 12768
_________________________________________________________________
dense_1 (Dense) (None, None, 200) 6600
=================================================================
Total params: 19,368
Trainable params: 19,368
Non-trainable params: 0
_________________________________________________________________
My question is, is the TimeDistributed
layer wrapper doing anything to the first model? Are these two different in any aspect (considering that their total number of params are identical)?