7

I am unable to understand the logic behind getting the output shape of the first hidden layer. I have taken some arbitrary examples as follows;

Example 1:

model.add(Dense(units=4,activation='linear',input_shape=(784,)))  
model.add(Dense(units=10,activation='softmax'))
model.summary()

Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_7 (Dense)              (None, 4)                 3140      
_________________________________________________________________
dense_8 (Dense)              (None, 10)                50        
=================================================================
Total params: 3,190
Trainable params: 3,190
Non-trainable params: 0

Example 2:

model.add(Dense(units=4,activation='linear',input_shape=(784,1)))   
model.add(Dense(units=10,activation='softmax'))
model.summary()
Model: "sequential_6"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_11 (Dense)             (None, 784, 4)            8         
_________________________________________________________________
dense_12 (Dense)             (None, 784, 10)           50        
=================================================================
Total params: 58
Trainable params: 58
Non-trainable params: 0

Example 3:

model.add(Dense(units=4,activation='linear',input_shape=(32,28)))    
model.add(Dense(units=10,activation='softmax'))
model.summary()
Model: "sequential_8"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_15 (Dense)             (None, 32, 4)             116       
_________________________________________________________________
dense_16 (Dense)             (None, 32, 10)            50        
=================================================================
Total params: 166
Trainable params: 166
Non-trainable params: 0

Example 4:

model.add(Dense(units=4,activation='linear',input_shape=(32,28,1)))    
model.add(Dense(units=10,activation='softmax'))
model.summary()
Model: "sequential_9"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_17 (Dense)             (None, 32, 28, 4)         8         
_________________________________________________________________
dense_18 (Dense)             (None, 32, 28, 10)        50        
=================================================================
Total params: 58
Trainable params: 58
Non-trainable params: 0

Please help me in understanding the logic.

Also, I think the rank of input_shape=(784,) and input_shape=(784,1) is the same then why is their Output Shape different?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Navdeep
  • 823
  • 1
  • 16
  • 35
  • 2
    This question has been already asked [here](https://stackoverflow.com/q/52089601/2099607) (though, with a different wording). – today May 02 '20 at 19:52

6 Answers6

6

According to the official documentation of Keras, for Dense layer when you give input as input_shape=(input_units,) the modal take as input arrays of shape (*, input_units) and outputs arrays of shape (*, output_units) [in your case input_shape=(784,) is treated as input shape=(*, 784) and output is output_shape=(*,4)]

In general for input dimension of (batch_size, ..., input_dim), the modal gives the output of size (batch_size, ..., units).

So when you give input as input_shape=(784,) the modal take as input arrays of shape (*, 784), where * is the batch size and 784 as input_dim, giving output shape as (*, 4).

When the input is (784,1), the modal takes it as (*, 784, 1) where * is the batch size, 784 is ... and 1 is input_dim =>(batch_size, ..., input_dim) and output as (*, 784, 4) => (batch_size, ..., units).

Same goes for the input_shape=(32,28)=>(*,32,28), giving output (*,32,4) and for input with input_shape=(32,28,1)=>(*,32,28,1) where again * is the batch_size, 32,28 is ... and 1 is the input_dim =>(batch_size, ..., input_dim)

On what does None means please check What is the meaning of the "None" in model.summary of KERAS?

Charul Giri
  • 126
  • 5
2

The logic is very simple: the dense layer is applied independently to the last dimension of the previous layer. Therefore, an input of shape (d1, ..., dn, d) through a dense layer with m units results in an output of shape (d1, ..., dn, m), and the layer has d*m+m parameters (m biases).

Note that the same weights are applied independently, so your example 4 works as follows:

for i in range(32):
    for j in range(28):
        output[i, j, :] = input[i, j, :] @ layer.weights + layer.bias

Where @ is matrix multiplication. input[i, j] is a vector of shape (1,), layer.weights has size (1,4) and layer.bias is a vector of (1,).

This also explains why (784,) and (784,1) give different results: their last dimensions are different, 784 and 1.

BlackBear
  • 22,411
  • 10
  • 48
  • 86
1

Dense layer requires the input as (batch_size, input_size),most of the time we skip batch_size and define it during training.

if your input shape is one-dimensional,in your first case (784,) model will take as input arrays of shape (~, 784) and output array of shape (~,4). Default it will add the bias which is 4(since 4 units).so total parameters will be

parameters -> 784*4 + 4 = 3140

if your input shape is two-dimensional,in second case (784,1) model will take as input arrays of shape (784,1) and output array of shape (None,784,4).None is the batch dimension. Default it will add the bias which is 4(since 4 units).so total parameters will be

parameters -> 4(output units) + 4(bias) = 8
Rajith Thennakoon
  • 3,975
  • 2
  • 14
  • 24
1

Output shape of a layer depends on the type of layer used. For example, output shape of Dense layer is based on units defined in the layer where as output shape of Conv layer depends on filters.

Another thing to remember is, by default, last dimension of any input is considered as number of channel. In the process of output shape estimation, number of channels are replaced by units defined in the layer. For one dimensional input such as input_shape=(784,), it is important to use , in the end.

Example 1 (one dimensional), example 2 (2 dimensional, channel=1), example 3 (2 dimensional, channel =28), and example 4 (3 dimensional, channel =1). As mentioned above last dimension is replaced by units defined in Dense layer.

More details on dimension, axis, channel, input_dim etc are mentioned very clearly in this stackoverflow answer.

Vishnuvardhan Janapati
  • 3,088
  • 1
  • 16
  • 25
1

keras is a high level api, which takes care of a lot of abstraction. The following example might help you understand better. It is closest possible raw tensorflow equivalent of the keras abstraction in your question:

import tensorflow as tf
from pprint import pprint


for shape in [(None,784,), (None, 784,1), (None, 32,28), (None, 32,28,1)]:
    shapes_list = []

    input_layer_1 = tf.compat.v1.placeholder(dtype=tf.float32, shape=shape, name=None)
    shapes_list.append(input_layer_1.shape)
    d1 = tf.compat.v1.layers.dense(
        inputs=input_layer_1, units=4, activation=None, use_bias=True, kernel_initializer=None,
        bias_initializer=tf.zeros_initializer(), kernel_regularizer=None,
        bias_regularizer=None, activity_regularizer=None, kernel_constraint=None,
        bias_constraint=None, trainable=True, name=None, reuse=None
    )
    shapes_list.append(d1.shape)
    d2 = tf.compat.v1.layers.dense(
        inputs=d1, units=10, activation=tf.compat.v1.nn.softmax, use_bias=True, kernel_initializer=None,
        bias_initializer=tf.zeros_initializer(), kernel_regularizer=None,
        bias_regularizer=None, activity_regularizer=None, kernel_constraint=None,
        bias_constraint=None, trainable=True, name=None, reuse=None
    )
    shapes_list.append(d2.shape)
    print('++++++++++++++++++++++++++')
    pprint(shapes_list)
    print('++++++++++++++++++++++++++')

The Dense function is used for making a Densely connected layer or Perceptron.

As per your code snippet, it seems you have created a multi-layer perceptron(with linear activation function f(x)=x) with hidden layer 1 having 4 neurons and the output layer customised for 10 classes/labels to be predicted.

The number of neurons in each layer is determined by units argument. And Shape of each neuron in layer_L is determined by the output of previous layer_L-1.

if input to a Dense layer is (BATCH_SIZE, N, l), Then then the shape of output will be (BATCH_SIZE, N, value_passed_to_argument_units_in_Dense)

and if input is (BATCH_SIZE, N, M, l), then output shape is (BATCH_SIZE, N, M, value_passed_to_argument_units_in_Dense) and so on.

NOTE :

this happens only in case of Dense neuron, because it doesn't alter the intermediate dimensions between batch_size and last_channel.

however in case of other neurons like Conv2D->(Max/Avg)pooling, the intermediate dimensions might(depends on the arguments passed) also change because these neurons act on these dimensions too.

Pratik Kumar
  • 2,211
  • 1
  • 17
  • 41
1

According to keras

Dense layer is applied on the last axis independently. [1]

https://github.com/keras-team/keras/issues/10736#issuecomment-406589140

First Example:

input_shape=(784,)
model.add(Dense(units=4,activation='linear',input_shape=(784,)))

It says that input has 784 rows only.. And first layer of model has 4 units. Each unit in the dense layer is connected to all 784 rows.

That is why

Output shape=  (None, 4) 

None represents batch_size which is not known here.

Second Example

Here tensor of rank 2 is input

input_shape=(784,1)
Units = 4

So now the input is 784 rows and 1 col. Now each unit of the dense layer is connected to 1 element from each of total 784 rows. Output Shape =(None, 784, 4)
None for batch size.

Third Example

 input_shape=(32,28)

Now each unit of dense layer is connected to 28 elements from each of 32 row. So

output_shape=(None,32,4)

Last Example

model.add(Dense(units=4,activation='linear',input_shape=(32,28,1)))   

again dense layer is applied to last axis and Output shape becomes

Output Shape =(None,32,28,4)

Note

rank is 1 at (784,) the comma does not represent another dimension. rank is 2 at (784,1)

The diagram in stackcoverflow post may help you further.