In CS231n course about Convolution Neural Network, in ConvNet note:
INPUT [32x32x3] will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.
CONV layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as [32x32x12] if we decided to use 12 filters.
From the document, I understand that a INPUT will contain images with 32 (width) x 32 (height) x 3 depth. But later in result of Conv layer, it was [32x32x12] if we decided to use 12 filters.
Where is the 3
as in depth of the image?
Please help me out here, thank you in advance.