I was reading a book on Deep learning and I am not able to understand this part about Conv2D

Question

Layer (type) Output Shape Param #
================================================================
conv2d_4 (Conv2D) (None, 26, 26, 32) 320
________________________________________________________________
conv2d_5 (Conv2D) (None, 24, 24, 64) 18496
________________________________________________________________
conv2d_6 (Conv2D) (None, 22, 22, 64) 36928
================================================================
Total params: 55,744
Trainable params: 55,744
Non-trainable params: 0

The author says:

The 3 × 3 windows in the third layer will only contain information coming from 7 × 7 windows in the initial input. The high-level patterns learned by the convnet will still be very small with regard to the initial input, which may not be enough to learn to classify digits (try recognizing a digit by only looking at it through windows that are 7 × 7 pixels!). We need the features from the last convolution layer to contain information about the totality of the input.

Now where did this 7x7 window come from?Isnt the window in first layer also 3X3 ? What am I missing?

score 1 · Answer 1 · answered Feb 09 '22 at 12:18

Let us consider this example:

        model = tf.keras.models.Sequential([
        tf.keras.layers.Conv2D(32, (3, 3), activation='relu',input_shape=(28,28,3)),
        tf.keras.layers.Conv2D(20, (3, 3), activation='relu'),
        tf.keras.layers.Conv2D(10, (3, 3), activation='relu')])

Here is the output of the model.summary():

        Model: "sequential_1"
        _________________________________________________________________
        Layer (type)                Output Shape              Param #   
        =================================================================
        conv2d_3 (Conv2D)           (None, 26, 26, 32)        896       
                                                                        
        conv2d_4 (Conv2D)           (None, 24, 24, 20)        5780      
                                                                        
        conv2d_5 (Conv2D)           (None, 22, 22, 10)        1810      
                                                                        
        =================================================================
        Total params: 8,486
        Trainable params: 8,486
        Non-trainable params: 0
        _________________________________________________________________

To calculate the output shape you could use this formula:

O = (W - K + 2P)/S + 1

where O is the output height/width, W is the input height/width, K is the filter size, P is the padding and S is the stride size.

To calculate the Parameter Number for conv_layer you could use this formula:

param_number_for_conv_layer = output_channel_number * (input_channel_number * kernel_height * kernel_width + 1)

Please refer this for 'None' in the output shape

I was reading a book on Deep learning and I am not able to understand this part about Conv2D

1 Answers1