Dense vs. TimeDistributed(Dense)

Question

Is there any difference between this two methods of using Dense layer? Seems output shape is the same and number of parameters is the same.

Will be the output the same if we use fixed weights?
Will be the result the same during training?

def test_rnn_output_v1():

    max_seq_length = 10
    n_features = 8
    rnn_dim = 64
    dense_dim = 16

    input = Input(shape=(max_seq_length, n_features))
    out = LSTM(rnn_dim, return_sequences=True)(input)
    out = Dense(dense_dim)(out)

    model = Model(inputs=[input], outputs=out)

    print(model.summary())

    # (None, max_seq_length, n_features)
    # (None, max_seq_length, dense_dim)

def test_rnn_output_v2():

    max_seq_length = 10
    n_features = 8
    rnn_dim = 64
    dense_dim = 16

    input = Input(shape=(max_seq_length, n_features))
    out = LSTM(rnn_dim, return_sequences=True)(input)
    out = TimeDistributed(Dense(dense_dim))(out)

    model = Model(inputs=[input], outputs=out)

    print(model.summary())

    # (None, max_seq_length, n_features)
    # (None, max_seq_length, dense_dim)

score 3 · Accepted Answer · answered Apr 23 '20 at 01:29

There is no difference between TimeDistributed(Dense(...)) and Dense(...) and they have exactly the same output dimension and connectivity. That's because the Dense layer is applied on the last axis of its input; therefore, it does not make a difference if it's wrapped in a TimeDistributed layer or not. This answer explains the workings of Dense layer in more detail.

Dense vs. TimeDistributed(Dense)

1 Answers1