TimeDistributed(Dense) vs Dense And TimeDistributed(Conv2D)

Question

I am trying to understand the how TimeDitributed() Layer works in keras!! I know that when we wrap Conv2D layer in TimeDitributed() it applies same Conv2D layer to all the time events of a video(or to different frames that are present in a video sequence). As mentioned here https://www.tensorflow.org/api_docs/python/tf/keras/layers/TimeDistributed.

For the purpose of my project I am trying to build an LSTM model which is of the as follows:

class Lstm_model_1(tf.keras.Model):

    def __init__(self, num_classes):
        super(Lstm_model_1, self).__init__()   
        self.Lstm1 = tf.keras.layers.LSTM(32,return_sequences=True)
        self.Lstm2 = tf.keras.layers.LSTM(32,return_sequences=True) 
        self.classifier = tf.keras.layers.Dense(num_classes, activation='softmax')
        self.TimeDistributed=tf.keras.layers.TimeDistributed(self.classifier)

    def call(self, inputs):
        input_A=inputs
        x = self.Lstm1(input_A)
        x = self.Lstm2(x)
        output = self.TimeDistributed(x)
        
        return  output
lstm_1 = Lstm_model_1(3)
lstm_1.compile(optimizer='adam', loss=tf.keras.losses.CategoricalCrossentropy())
lstm_1.fit(X_train,Y_train, epochs=3,validation_data=(X_test,Y_test))
lstm_1.summary()
Model: "lstm_model_1_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_9 (LSTM)                multiple                  55552     
_________________________________________________________________
lstm_10 (LSTM)               multiple                  8320      
_________________________________________________________________
dense_6 (Dense)              multiple                  99        
_________________________________________________________________
time_distributed (TimeDistri multiple                  99        
=================================================================
Total params: 63,971
Trainable params: 63,971
Non-trainable params: 0
_________________________________________________________________

Here I am getting 99 parameters in the TimeDistributed() layer.

Now when I am not using TimeDistributed() Layer I am getting same number of parameters i.e 99.

I have read in the following posts that :-

If return_sequences=True, then the Dense layer is used to apply at every timestep just like TimeDistributedDense.

and

As a side note: this makes TimeDistributed(Dense(...)) and Dense(...) equivalent to each other.

Another side note: be aware that this has the effect of shared weights.""

Now According to me it makes sense that the dense layer when applied on the LSTM return_sequences=True should have the same weights for the all the timestamps. But I have few questions that are mentioned below.

Is the TimeDitributed() wrapped with Dense() is redundant and can we use Dense() directly?
If let's say I don't want to use shared weights corresponding to the sequence outputs then what should I do? I want my network to learn different set of weights corresponding to each of the output in case of return_sequences=True
Why are we still wrapping our Dense() layer in the TimeDitributed() layer if both of them are sharing weights in the time squences? I have seen the usage of TimeDitributed() layer with RepeatedVector() layer here https://datascience.stackexchange.com/questions/46491/what-is-the-job-of-repeatvector-and-timedistributed
Is it only with the case of Dense() that TimeDitribued() is redundant or is it also the same with Conv2D Layer?

TimeDistributed(Dense) vs Dense And TimeDistributed(Conv2D)

0 Answers0