I am trying to understand the how TimeDitributed()
Layer works in keras!! I know that when we wrap Conv2D
layer in TimeDitributed()
it applies same Conv2D layer to all the time events of a video(or to different frames that are present in a video sequence). As mentioned here https://www.tensorflow.org/api_docs/python/tf/keras/layers/TimeDistributed.
For the purpose of my project I am trying to build an LSTM model which is of the as follows:
class Lstm_model_1(tf.keras.Model):
def __init__(self, num_classes):
super(Lstm_model_1, self).__init__()
self.Lstm1 = tf.keras.layers.LSTM(32,return_sequences=True)
self.Lstm2 = tf.keras.layers.LSTM(32,return_sequences=True)
self.classifier = tf.keras.layers.Dense(num_classes, activation='softmax')
self.TimeDistributed=tf.keras.layers.TimeDistributed(self.classifier)
def call(self, inputs):
input_A=inputs
x = self.Lstm1(input_A)
x = self.Lstm2(x)
output = self.TimeDistributed(x)
return output
lstm_1 = Lstm_model_1(3)
lstm_1.compile(optimizer='adam', loss=tf.keras.losses.CategoricalCrossentropy())
lstm_1.fit(X_train,Y_train, epochs=3,validation_data=(X_test,Y_test))
lstm_1.summary()
Model: "lstm_model_1_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_9 (LSTM) multiple 55552
_________________________________________________________________
lstm_10 (LSTM) multiple 8320
_________________________________________________________________
dense_6 (Dense) multiple 99
_________________________________________________________________
time_distributed (TimeDistri multiple 99
=================================================================
Total params: 63,971
Trainable params: 63,971
Non-trainable params: 0
_________________________________________________________________
Here I am getting 99 parameters in the TimeDistributed()
layer.
Now when I am not using TimeDistributed()
Layer I am getting same number of parameters i.e 99.
I have read in the following posts that :-
If return_sequences=True, then the Dense layer is used to apply at every timestep just like TimeDistributedDense.
and
As a side note: this makes TimeDistributed(Dense(...)) and Dense(...) equivalent to each other.
Another side note: be aware that this has the effect of shared weights.""
Now According to me it makes sense that the dense layer when applied on the LSTM return_sequences=True
should have the same weights for the all the timestamps. But I have few questions that are mentioned below.
- Is the
TimeDitributed()
wrapped withDense()
is redundant and can we useDense()
directly? - If let's say I don't want to use shared weights corresponding to the sequence outputs then what should I do? I want my network to learn different set of weights corresponding to each of the output in case of
return_sequences=True
- Why are we still wrapping our
Dense()
layer in theTimeDitributed()
layer if both of them are sharing weights in the time squences? I have seen the usage ofTimeDitributed()
layer withRepeatedVector()
layer here https://datascience.stackexchange.com/questions/46491/what-is-the-job-of-repeatvector-and-timedistributed - Is it only with the case of
Dense()
thatTimeDitribued()
is redundant or is it also the same withConv2D
Layer?