1

I code a CNN model with Mnist in Keras. The code and print its summary like this: code for cnn:

    model = Sequential()
    model.add(Conv2D(32, kernel_size=(3, 3),
                     activation='relu',
                     input_shape=input_shape))
    model.add(Conv2D(63, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))
    model.add(Flatten())
    model.add(Dense(128, name='dense', activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(10, activation='softmax'))

    model.compile(loss=keras.losses.categorical_crossentropy,
                  optimizer=keras.optimizers.Adadelta(),
                  metrics=['accuracy'])

model summary:

x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 24, 24, 63)        18207     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 12, 12, 63)        0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 12, 12, 63)        0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 9072)              0         
_________________________________________________________________
dense (Dense)                (None, 128)               1161344   
_________________________________________________________________
dropout_2 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                1290      
=================================================================
Total params: 1,181,161
Trainable params: 1,181,161
Non-trainable params: 0
_________________________________________________________________

The kernel_size of the first and the second conv2D layers are all (3,3).

I don't understand why there are 18207 parameters in the second conv2D layer. Shouldn't it be computed like (3*3+1)*63=630?

david
  • 842
  • 2
  • 8
  • 25

1 Answers1

4

To get the Number of parameters you need to apply the following formula:

(FxFxCi+1)xC0

Where FxF is the kernel size, C0 the output Channels and Ci the input channels. So in your case you are just forgetting the input channels parameter:

18207 = 63*(32*3*3+1)

Edit to answer comments:

when you have the output of the first layer you obtain an "image" of shape: (None, 26, 26, 32) (None being the batch size). So intuitively you will need to learn kernels for every dimension (channel) and as such will need a kernel for every dimension, and then map it to the output dimension. The output dimension is depending on the parameters of the kernel but also the number of kernels: Convolutions are usually computed for each channel and summed. So for exemple you have a (28,28,3) pic with a conv of 3 kernels (5,5,3) and your output will be a (24,24) pic (1 Output Channel). You have 1 kernel for every dimension which you then sum to get the output.

But you can also have multiple convolutions:

you still have the same pic (28,28,3) but then have a convolutional layer of size (5,5,3,4). Meaning that you have 4 of the convolution we describe above. to get an output of size (24,24,4) you don't sum the conv, you stack them to get a picture with multiple channel. You learn multiple independent convolutions at the same time. So you see where the calculation comes from. And why the input channels are indeed very important, as are the output ones. But they represent very different parameters. (see this for a more detail & visual explanation)

Frayal
  • 2,117
  • 11
  • 17
  • Thank you, but I am a little confused. Why do I need to time the input channels parameter 32? Does it mean the filter in the second layer needs different kernel parameters for different channels from the last layer? – david Mar 11 '19 at 10:24
  • updated answer @david (tell me if there is still dark zones) – Frayal Mar 11 '19 at 10:38
  • @david In other words, the kernel size is like the number of nodes in a hidden layer, it determines the output shape not the number of trainable variables. So more inputs needs more variables to connect to the hidden layer. – anishtain4 Mar 11 '19 at 14:51
  • @anishtain4 Thank you! – david Mar 11 '19 at 16:54