Understand the role of Flatten in Keras and determine when to use it

Question

I am trying to understand a model developed for time series forecasting. It uses a Con1D layer and two LSTM layers and after that, a dense layer. My question is, should it use Flatten() between the LSTM and the Denser layer? In my mind, the output should just have one value, which has a shape of (None, 1), and it can be achieved by using Flatten() between LSTM and Dense layer. Without the Flatten(), the output shape would be (None, 30, 1). Alternatively, I can remove the return_sequences=True from the second LSTM layer, which I think has the same effect as the Flatten(). Which one is a more appropriate way? Do they affect the loss? Here is the model.

model = tf.keras.models.Sequential([
    tf.keras.layers.Conv1D(filters=32, kernel_size=3, strides=1, padding="causal", activation="relu", input_shape=(30 ,1)),
    tf.keras.layers.LSTM(32, return_sequences=True),
    tf.keras.layers.LSTM(32, return_sequences=True),
    # tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(1),
    ])

Here is the model summary without Flatten()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv1d (Conv1D)              (None, 30, 32)            128       
_________________________________________________________________
lstm (LSTM)                  (None, 30, 32)            8320      
_________________________________________________________________
lstm_1 (LSTM)                (None, 30, 32)            8320      
_________________________________________________________________
dense (Dense)                (None, 30, 1)             33        
=================================================================
Total params: 16,801
Trainable params: 16,801
Non-trainable params: 0
_________________________________________________________________

score 2 · Accepted Answer · edited Jul 14 '21 at 22:00

2

Well, it depends on what you want to achieve. I try to give you some hints, because is not 100% clear for me what you want to obtain.

If your LSTM uses return_sequences=True, then you are returning the output of each LSTM cell, i.e., an output for each timestamps. If you then add a dense layer, one of them will be add on the top of each LSTM layer.

If you use the flatten layer with the return_sequences=True, then you are basically removing the temporal dimension, having something like (None, 30) in your case. Then, you can add a dense layer or wathever you need.

If you set return_sequences=False, you just get the output at the very end of your LSTM (note that in any case, due to the LSTM functionality, it is based on the computation happened at the previous timestamps), and the output will be of the shape (None, dim) where dim is equals to the number of hidden units you are using in your LSTM (i.e., 32). Here, again, you can simply add a dense layer with one hidden unit, to have what you are looking for.

edited Jul 14 '21 at 22:00

Keivan

1,300
1
16
29

answered Jul 08 '20 at 08:23

nsacco

88
5

Thanks, @nsacco. Your explanation helps a lot. So the next question for me is, when should I use each of the options you described? You mentioned that it depends on what I want to achieve. Can you provide an example of when I would want to use Flatten()? My label is one value, so Flatten() will give me an output shape of (None, 1) at the last layer, which corresponds to the label dimension. If unflattened, the output shape is (None, 30, 1) and is not consistent with the labels. Given my case, does it make more sense to use Flatten()? – snxmx Jul 08 '20 at 15:15
Let's suppose you want to do a task similar to sentiment analysis. You want to classify the sentiment of your sentence in a binary classification fashion. What I would do, is to use the LSTM layer to process the sentence(s), with `return_sequences=False`, and I would add a dense layer with 1 unit after the LSTM to manage the classification part. In this setting, your NN output would be (None, 1). I would proceed in this way rather than using a flatten layer. – nsacco Jul 08 '20 at 16:36
That makes sense. I have been thinking about this and think one other reason Flatten is not used in this case (time series forecasting or word predicting) might be that we want to keep the time step as one of the dimensions. Flattening it would remove the time dimension. Thanks for the help! – snxmx Jul 09 '20 at 03:18

score 1 · Answer 2 · answered Jul 08 '20 at 05:06

1

Please refer to this link here>>similar question.

flatten() is generally used before the output layer. It is better to use flatten over the full output of the LSTM layer...can it be used after the dense layer rather than after LSTM layers.

I would like to learn from the counters by other answers and comments here.

answered Jul 08 '20 at 05:06

TulakHord

422
7
15

Thanks, @Neerajan. So given my example, your suggestion would be using flatten and add that after the last Dense layer, correct? – snxmx Jul 08 '20 at 15:21
Is this a question? Or an answer? It's difficult to tell at the moment. – TylerH Oct 20 '20 at 18:39

Understand the role of Flatten in Keras and determine when to use it

2 Answers2

Linked