I'm trying to create a custom convolution layer in python that will be using a 3x3 kernel. What I'm attempting to do is take this 3x3 kernel and move it along a 32x32 image where I take the dot product between the pixel values and the kernel.
What I did was create two nested for loops where I kept cutting out a 3x3 area from the image and then I used torch.matmul
and torch.sum
to filter the pixel values with stride = 1.
for i in range((x.shape[0]-3) + 1):
for j in range((x.shape[1]-3) + 1):
x_out[i][j] = torch.sum(torch.matmul(kernel, x[i:i+3, j:j+3]))
The shape of x
is torch.size(32, 32)
and the shape of x_out
will be torch.size(30,30)
. The kernel is a 3x3 tensor.
The problem is the use of the nested for loops. If I run this in a neural network, it will be too slow. In order to increase the speed, I need figure out how to do this without for loops. So how can I do that?