186

I am trying to understand the role of the Flatten function in Keras. Below is my code, which is a simple two-layer network. It takes in 2-dimensional data of shape (3, 2), and outputs 1-dimensional data of shape (1, 4):

model = Sequential()
model.add(Dense(16, input_shape=(3, 2)))
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dense(4))
model.compile(loss='mean_squared_error', optimizer='SGD')

x = np.array([[[1, 2], [3, 4], [5, 6]]])

y = model.predict(x)

print y.shape

This prints out that y has shape (1, 4). However, if I remove the Flatten line, then it prints out that y has shape (1, 3, 4).

I don't understand this. From my understanding of neural networks, the model.add(Dense(16, input_shape=(3, 2))) function is creating a hidden fully-connected layer, with 16 nodes. Each of these nodes is connected to each of the 3x2 input elements. Therefore, the 16 nodes at the output of this first layer are already "flat". So, the output shape of the first layer should be (1, 16). Then, the second layer takes this as an input, and outputs data of shape (1, 4).

So if the output of the first layer is already "flat" and of shape (1, 16), why do I need to further flatten it?

nbro
  • 15,395
  • 32
  • 113
  • 196
Karnivaurus
  • 22,823
  • 57
  • 147
  • 247

10 Answers10

181

If you read the Keras documentation entry for Dense, you will see that this call:

Dense(16, input_shape=(5,3))

would result in a Dense network with 3 inputs and 16 outputs which would be applied independently for each of 5 steps. So, if D(x) transforms 3 dimensional vector to 16-d vector, what you'll get as output from your layer would be a sequence of vectors: [D(x[0,:]), D(x[1,:]),..., D(x[4,:])] with shape (5, 16). In order to have the behavior you specify you may first Flatten your input to a 15-d vector and then apply Dense:

model = Sequential()
model.add(Flatten(input_shape=(3, 2)))
model.add(Dense(16))
model.add(Activation('relu'))
model.add(Dense(4))
model.compile(loss='mean_squared_error', optimizer='SGD')

EDIT: As some people struggled to understand - here you have an explaining image:

enter image description here

Community
  • 1
  • 1
Marcin Możejko
  • 39,542
  • 10
  • 109
  • 120
  • Thanks for your explanation. Just to clarify though: with `Dense(16, input_shape=(5,3)`, will each output neuron from the set of 16 (and, for all 5 sets of these neurons), be connected to all (3 x 5 = 15) input neurons? Or will each neuron in the first set of 16 only be connected to the 3 neurons in the first set of 5 input neurons, and then each neuron in the second set of 16 is only connected to the 3 neurons in the second set of 5 input neurons, etc.... I'm confused as to which it is! – Karnivaurus Apr 06 '17 at 12:49
  • 1
    You have one Dense layer which gets 3 neurons and output 16 which is applied to each of 5 sets of 3 neurons. – Marcin Możejko Apr 06 '17 at 12:55
  • But don't the output neurons have connections to all 5 sets of input neurons? The reason I thought this might be the case, is that in convolutional networks, each feature map in the first layer takes as input all three channels (R,G,B) from the input. – Karnivaurus Apr 06 '17 at 12:59
  • There are 5 sets of 16 neurons and each have connection to only one set of 3 input neurons. – Marcin Możejko Apr 06 '17 at 13:01
  • 1
    Ah ok. What I am trying to do is take a list of 5 colour pixels as input, and I want them to pass through a fully-connected layer. So `input_shape=(5,3)` means that there are 5 pixels, and each pixel has three channels (R,G,B). But according to what you are saying, each channel would be processed individually, whereas I want all three channels to be processed by all neurons in the first layer. So would applying the `Flatten` layer immediately at the start give me what I want? – Karnivaurus Apr 06 '17 at 13:08
  • The docs you linked to say "Note: if the input to the layer has a rank greater than 2, then it is flattened prior to the initial dot product with kernel.", Assuming that rank includes the batch dimension, I would interpret to mean that if the input is (None, 5, 3), it would be flattened to (None, 15). The actual behavior seems to be what you're describing though. Not sure if that's a documentation issue or if I'm interpreting it incorrectly. – ssfrr Jun 02 '17 at 14:20
  • 14
    A little drawing with and without `Flatten` may help to understand. – Xvolks Aug 28 '17 at 14:52
  • 1
    @Xvolks can you share a drawing? I would like to understand this. – Gásten Sep 07 '17 at 10:03
  • Actually it was a request I asked to @MarcinMożejko as I am also struggling to understand his answer. – Xvolks Sep 07 '17 at 10:20
  • 5
    Ok, Guys - I provided you an image. Now you could delete your downvotes. – Marcin Możejko Sep 13 '17 at 21:34
  • @MarcinMożejko no downvote on my side. Actually I’ve just upvoted it. – Xvolks Sep 20 '17 at 13:37
  • Will the weights and biases be shared across the timesteps( or steps )? – Nagabhushan Baddi Oct 16 '18 at 09:35
  • Thanks for the graphical explanation, but I don't think "plus" is correct sign for neural network sign. How about "arrow" like --- input(5,3) ---> Flatten --- output (15, ) --->? – Cloud Cho Mar 01 '21 at 21:57
  • @MarcinMożejko: I do not understand the picture at all. It is quite confusing for me. Why has the input vector (5,3) exactly the same shape in the figure as the output vector (5, 16). So there is no difference between the 3 and the 16? – PeterBe Oct 27 '21 at 12:31
109

enter image description here This is how Flatten works converting Matrix to single array.

Mahesh Kembhavi
  • 1,125
  • 1
  • 6
  • 3
  • 43
    Yes, but _why_ is it needed, this is the actual question I think. – Helen Dec 01 '19 at 12:00
  • 1
    A picture is worth a thousand words – Hom Bahrani Sep 09 '22 at 11:34
  • To answer @Helen in my understanding flattening is used to reduce the dimensionality of the input to a layer. A dense layer expects a row vector (which again, mathematically is a multidimensional object still), where each column corresponds to a feature input of the dense layer, so basically a convenient equivalent of Numpy's `reshape` : ). Actually, flattening is a pretty generic thing. For instance on hardware you might want to flatten a `struct` into a logically continuous string of bits to pass it through the network. – Balázs Börcsök Nov 16 '22 at 15:39
  • @HomBahrani Not *this* picture, which "flattens" a 1D array into a 1D array. – endolith Nov 26 '22 at 04:03
  • @endolith I think is flattening a 2D array into 1D – imatiasmb Dec 10 '22 at 18:22
  • @imatiasmb But both the before and after are 1D... – endolith Dec 12 '22 at 15:28
55

short read:

Flattening a tensor means to remove all of the dimensions except for one. This is exactly what the Flatten layer does.

long read:

If we take the original model (with the Flatten layer) created in consideration we can get the following model summary:

Layer (type)                 Output Shape              Param #   
=================================================================
D16 (Dense)                  (None, 3, 16)             48        
_________________________________________________________________
A (Activation)               (None, 3, 16)             0         
_________________________________________________________________
F (Flatten)                  (None, 48)                0         
_________________________________________________________________
D4 (Dense)                   (None, 4)                 196       
=================================================================
Total params: 244
Trainable params: 244
Non-trainable params: 0

For this summary the next image will hopefully provide little more sense on the input and output sizes for each layer.

The output shape for the Flatten layer as you can read is (None, 48). Here is the tip. You should read it (1, 48) or (2, 48) or ... or (16, 48) ... or (32, 48), ...

In fact, None on that position means any batch size. For the inputs to recall, the first dimension means the batch size and the second means the number of input features.

The role of the Flatten layer in Keras is super simple:

A flatten operation on a tensor reshapes the tensor to have the shape that is equal to the number of elements contained in tensor non including the batch dimension.

enter image description here


Note: I used the model.summary() method to provide the output shape and parameter details.

prosti
  • 42,291
  • 14
  • 186
  • 151
  • 1
    You said `None` means any batch size, but why the output shape of `D16` also has `None`, isn't `3` the batch size here? – Ray Jasson Jan 27 '22 at 19:12
  • No, it isn't you can choose any batch size in my understanding. How did you arrive on the result that the batch size *must* be 3? – Balázs Börcsök Nov 16 '22 at 15:46
4

I came across this recently, it certainly helped me understand: https://www.cs.ryerson.ca/~aharley/vis/conv/

So there's an input, a Conv2D, MaxPooling2D etc, the Flatten layers are at the end and show exactly how they are formed and how they go on to define the final classifications (0-9).

AEngineer
  • 152
  • 3
  • 12
3

It is rule of thumb that the first layer in your network should be the same shape as your data. For example our data is 28x28 images, and 28 layers of 28 neurons would be infeasible, so it makes more sense to 'flatten' that 28,28 into a 784x1. Instead of wriitng all the code to handle that ourselves, we add the Flatten() layer at the begining, and when the arrays are loaded into the model later, they'll automatically be flattened for us.

2

Flattening is converting the data into a 1-dimensional array for inputting it to the next layer. We flatten the output of the convolutional layers to create a single long feature vector. In some architectures, e.g. CNN an image is better processed by a neural network if it is in 1D form rather than 2D.

enter image description here

Hom Bahrani
  • 3,022
  • 27
  • 25
1

Flatten make explicit how you serialize a multidimensional tensor (tipically the input one). This allows the mapping between the (flattened) input tensor and the first hidden layer. If the first hidden layer is "dense" each element of the (serialized) input tensor will be connected with each element of the hidden array. If you do not use Flatten, the way the input tensor is mapped onto the first hidden layer would be ambiguous.

roberto
  • 51
  • 2
0

Here I would like to present another alternative to Flatten function. This may help to understand what is going on internally. The alternative method adds three more code lines. Instead of using

#==========================================Build a Model
model = tf.keras.models.Sequential()

model.add(keras.layers.Flatten(input_shape=(28, 28, 3)))#reshapes to (2352)=28x28x3
model.add(layers.experimental.preprocessing.Rescaling(1./255))#normalize
model.add(keras.layers.Dense(128,activation=tf.nn.relu))
model.add(keras.layers.Dense(2,activation=tf.nn.softmax))

model.build()
model.summary()# summary of the model

we can use

    #==========================================Build a Model
    tensor = tf.keras.backend.placeholder(dtype=tf.float32, shape=(None, 28, 28, 3))
    
    model = tf.keras.models.Sequential()
    
    model.add(keras.layers.InputLayer(input_tensor=tensor))
    model.add(keras.layers.Reshape([2352]))
model.add(layers.experimental.preprocessing.Rescaling(1./255))#normalize
    model.add(keras.layers.Dense(128,activation=tf.nn.relu))
    model.add(keras.layers.Dense(2,activation=tf.nn.softmax))
    
    model.build()
    model.summary()# summary of the model

In the second case, we first create a tensor (using a placeholder) and then create an Input layer. After, we reshape the tensor to flat form. So basically,

Create tensor->Create InputLayer->Reshape == Flatten

Flatten is a convenient function, doing all this automatically. Of course both ways has its specific use cases. Keras provides enough flexibility to manipulate the way you want to create a model.

Matt Allen
  • 492
  • 5
  • 12
0

Keras flatten class is very important when you have to deal with multi-dimensional inputs such as image datasets. Keras.layers.flatten function flattens the multi-dimensional input tensors into a single dimension, so you can model your input layer and build your neural network model, then pass those data into every single neuron of the model effectively.

You can understand this easily with the fashion MNIST dataset. The images in this dataset are 28 * 28 pixels. Hence if you print the first image in python you can see a multi-dimensional array, which we really can't feed into the input layer of our Deep Neural Network.

print(train_images[0])

first image of fashion MNIST

To tackle this problem we can flatten the image data when feeding it into a neural network. We can do this by turning this multidimensional tensor into a one-dimensional array. In this flattened array now we have 784 elements (28 * 28). Then we can create out input layer with 784 neurons to handle each element of the incoming data.

We can do this all by using a single line of code, sort of...

keras.layers.flatten(input_shape=(28,28))
Ryan M
  • 18,333
  • 31
  • 67
  • 74
  • Do you mean that this layer is typically equivalent to those two lines of reshaping inputs: ```xTrain = xTrain.reshape(xTrain.shape[0], -1) ``` ```xTest = xTest.reshape(xTest.shape[0], -1) ``` – Osama El-Ghonimy Sep 14 '21 at 00:27
0

As the name suggests it just flattens out the input Tensor. A very good visual to understand this is given below. Please let me know if there is any confusion. Flatten Input Tensor