2

I am new to deep learning and attempting to understand how CNN performs image classification

i have gone through multiple youtube videos, multiple blogs and papers as well. And they all mention roughly the same thing:

  1. add filters to get feature maps
  2. perform pooling
  3. remove linearity using RELU
  4. send to a fully connected network.

While this is all fine and dandy, i dont really understand how convolution works in essence. Like for example. edge detection.

like for ex: [[-1, 1], [-1,1]] detects a vertical edge.

How? Why? how do we know for sure that this will detect a vertical edge .

Similarly matrices for blurring/sharpening, how do we actually know that they will perform what they are aimed for.

do i simply takes peoples word for it?

Please help/ i feel helpless since i am not able to understand convolution and how the matrices detects edges or shapes

user1906450
  • 517
  • 1
  • 6
  • 19

1 Answers1

4

Filters detect spatial patterns such as edges in an image by detecting the changes in intensity values of the image.

A quick recap: In terms of an image, a high-frequency image is the one where the intensity of the pixels changes by a large amount, while a low-frequency image the one where the intensity is almost uniform. An image has both high and low frequency components. The high-frequency components correspond to the edges of an object because at the edges the rate of change of intensity of pixel values is high.

High pass filters are used to enhance the high-frequency parts of an image.

Let's take an example that a part of your image has pixel values as [[10, 10, 0], [10, 10, 0], [10, 10, 0]] indicating the image pixel values are decreasing toward the right i.e. the image changes from light at the left to dark at the right. The filter used here is [[1, 0, -1], [1, 0, -1], [1, 0, -1]].

Now, we take the convolutional of these two matrices that give the output [[10, 0, 0], [10, 0, 0], [10, 0, 0]]. Finally, these values are summed up to give a pixel value of 30, which gives the variation in pixel values as we move from left to right. Similarly, we find the subsequent pixel values.

vertical edge detection

Here, you will notice that a rate of change of pixel values varies a lot from left to right thus a vertical edge has been detected. Had you used the filter [[1, 1, 1], [0, 0, 0], [-1, -1, -1]], you would get the convolutional output consisting of 0s only i.e. no horizontal edge present. In the similar ways, [[-1, 1], [-1, 1]] detects a vertical edge.

You can check more here in a lecture by Andrew Ng.

Edit: Usually, a vertical edge detection filter has bright pixels on the left and dark pixels on the right (or vice-versa). The sum of values of the filter should be 0 else the resultant image will become brighter or darker. Also, in convolutional neural networks, the filters are learned the same way as hyperparameters through backpropagation during the training process.

kHarshit
  • 11,362
  • 10
  • 52
  • 71
  • thank you for your response. one question though, and this is where i land into trouble. [-1,2,-1],[-1,2,-1],[-1,2,-1] is also a vertical detector. how come? going by the same logic can we have this [-1,200,-9],[-1,200,-9],[-1,200,-9] also as a vertical edge detector? – user1906450 Dec 09 '18 at 16:05
  • One important thing I forgot to mention is that the sum of values in filter should be 0. I have updated the answer! The second example can act as vertical edge detector if you replace -9 with 9 or -1 with 1, but it won't that good because sum isn't 0. The first example you showed is a line detection filter. I tried it on an image, it showed bad results as edge detection. A good example of edge detection filter is sobel's filter. Read more here: https://stackoverflow.com/questions/50359485/what-is-the-difference-between-a-line-and-an-edge-in-image-detection – kHarshit Dec 09 '18 at 17:27