1

I'm trying to create a custom convolution layer in python that will be using a 3x3 kernel. What I'm attempting to do is take this 3x3 kernel and move it along a 32x32 image where I take the dot product between the pixel values and the kernel.

What I did was create two nested for loops where I kept cutting out a 3x3 area from the image and then I used torch.matmul and torch.sum to filter the pixel values with stride = 1.

for i in range((x.shape[0]-3) + 1):
    for j in range((x.shape[1]-3) + 1):             
        x_out[i][j] = torch.sum(torch.matmul(kernel, x[i:i+3, j:j+3]))

The shape of x is torch.size(32, 32) and the shape of x_out will be torch.size(30,30). The kernel is a 3x3 tensor.

The problem is the use of the nested for loops. If I run this in a neural network, it will be too slow. In order to increase the speed, I need figure out how to do this without for loops. So how can I do that?

Josh Susa
  • 385
  • 1
  • 6
  • 13
  • Did you consider using `map` ? – razimbres Mar 07 '19 at 23:47
  • I'm not really sure what that is but I'll look into it. – Josh Susa Mar 08 '19 at 00:38
  • 1
    Read [this](https://stackoverflow.com/questions/46213531/how-is-using-im2col-operation-in-convolutional-nets-more-efficient), especially [the link refered](https://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of-deep-learning/) – lincr Mar 08 '19 at 06:49

0 Answers0