1

I am trying to understand the how TimeDitributed() Layer works in keras!! I know that when we wrap Conv2D layer in TimeDitributed() it applies same Conv2D layer to all the time events of a video(or to different frames that are present in a video sequence). As mentioned here https://www.tensorflow.org/api_docs/python/tf/keras/layers/TimeDistributed.

For the purpose of my project I am trying to build an LSTM model which is of the as follows:

class Lstm_model_1(tf.keras.Model):

    def __init__(self, num_classes):
        super(Lstm_model_1, self).__init__()   
        self.Lstm1 = tf.keras.layers.LSTM(32,return_sequences=True)
        self.Lstm2 = tf.keras.layers.LSTM(32,return_sequences=True) 
        self.classifier = tf.keras.layers.Dense(num_classes, activation='softmax')
        self.TimeDistributed=tf.keras.layers.TimeDistributed(self.classifier)

    def call(self, inputs):
        input_A=inputs
        x = self.Lstm1(input_A)
        x = self.Lstm2(x)
        output = self.TimeDistributed(x)
        
        return  output
lstm_1 = Lstm_model_1(3)
lstm_1.compile(optimizer='adam', loss=tf.keras.losses.CategoricalCrossentropy())
lstm_1.fit(X_train,Y_train, epochs=3,validation_data=(X_test,Y_test))
lstm_1.summary()
Model: "lstm_model_1_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_9 (LSTM)                multiple                  55552     
_________________________________________________________________
lstm_10 (LSTM)               multiple                  8320      
_________________________________________________________________
dense_6 (Dense)              multiple                  99        
_________________________________________________________________
time_distributed (TimeDistri multiple                  99        
=================================================================
Total params: 63,971
Trainable params: 63,971
Non-trainable params: 0
_________________________________________________________________

Here I am getting 99 parameters in the TimeDistributed() layer.

Now when I am not using TimeDistributed() Layer I am getting same number of parameters i.e 99.

I have read in the following posts that :-

If return_sequences=True, then the Dense layer is used to apply at every timestep just like TimeDistributedDense.

and


As a side note: this makes TimeDistributed(Dense(...)) and Dense(...) equivalent to each other.


Another side note: be aware that this has the effect of shared weights.""


  1. TimeDistributed(Dense) vs Dense in seq2seq
  2. Keras Dense layer's input is not flattened

Now According to me it makes sense that the dense layer when applied on the LSTM return_sequences=True should have the same weights for the all the timestamps. But I have few questions that are mentioned below.

  1. Is the TimeDitributed() wrapped with Dense() is redundant and can we use Dense() directly?
  2. If let's say I don't want to use shared weights corresponding to the sequence outputs then what should I do? I want my network to learn different set of weights corresponding to each of the output in case of return_sequences=True
  3. Why are we still wrapping our Dense() layer in the TimeDitributed() layer if both of them are sharing weights in the time squences? I have seen the usage of TimeDitributed() layer with RepeatedVector() layer here https://datascience.stackexchange.com/questions/46491/what-is-the-job-of-repeatvector-and-timedistributed
  4. Is it only with the case of Dense() that TimeDitribued() is redundant or is it also the same with Conv2D Layer?
Varun Singh
  • 489
  • 3
  • 13

0 Answers0