2-D convolution as a matrix-matrix multiplication

Question

I know that, in the 1D case, the convolution between two vectors, a and b, can be computed as conv(a, b), but also as the product between the T_a and b, where T_a is the corresponding Toeplitz matrix for a.

Is it possible to extend this idea to 2D?

Given a = [5 1 3; 1 1 2; 2 1 3] and b=[4 3; 1 2], is it possible to convert a in a Toeplitz matrix and compute the matrix-matrix product between T_a and b as in the 1-D case?

See [How can the convolution operation be implemented as a matrix-vector multiplication?](https://ai.stackexchange.com/q/11172/2444). — nbro, Jun 14 '20 at 13:59
I’m voting to close this question because it is not about programming as defined in the [help] but about ML theory and/or methodology - please see the intro and NOTE in the `deep-learning` [tag info](https://stackoverflow.com/tags/deep-learning/info). — desertnaut, Jun 28 '21 at 16:01

score 68 · Answer 1 · edited Sep 20 '18 at 09:15

68

Yes, it is possible and you should also use a doubly block circulant matrix (which is a special case of Toeplitz matrix). I will give you an example with a small size of kernel and the input, but it is possible to construct Toeplitz matrix for any kernel. So you have a 2d input x and 2d kernel k and you want to calculate the convolution x * k. Also let's assume that k is already flipped. Let's also assume that x is of size n×n and k is m×m.

So you unroll k into a sparse matrix of size (n-m+1)^2 × n^2, and unroll x into a long vector n^2 × 1. You compute a multiplication of this sparse matrix with a vector and convert the resulting vector (which will have a size (n-m+1)^2 × 1) into a n-m+1 square matrix.

I am pretty sure this is hard to understand just from reading. So here is an example for 2×2 kernel and 3×3 input.

*

Here is a constructed matrix with a vector:

which is equal to .

And this is the same result you would have got by doing a sliding window of k over x.

edited Sep 20 '18 at 09:15

primfaktor

2,831
25
34

answered May 18 '17 at 05:23

Salvador Dali

214,103
147
703
753

2

There would have to be some sort of reshaping at the end correct? That last vector is 4 x 1 but the result of the convolution would be 2 x 2 – jvans Jun 12 '17 at 17:11
2

@jvans yes, in the end you should reshape your vector. It is written here: **convert the resulting vector (which will have a size (n-m+1)^2 X 1) into a n-m+1 square matrix** – Salvador Dali Jun 12 '17 at 18:14
6

In your example this is not a Toeplitz matrix. So you answer is only partially correct, is it ? – user2682877 Mar 14 '18 at 17:18
1

What you mean by `Also let's assume that k is already flipped`? Is it because we want to perform correlation instead of convolution? What is `flipped` in terms of numpy operations? – mrgloom Sep 09 '18 at 16:47
@mrgloom Yes, the operation above is a correlation which is why he flips the filter vertically (upside down) first so it then becomes equivalent to a convolution. The numpy is `flip(m, 0)`, which is is equivalent to `flipud(m)`. – Daniel Arnett Apr 27 '19 at 14:11
Just a short question: if I’ve multiple e.g. 2 channels, the rhs of the GEMM becomes a matrix with two columns, right? – Bastian Aug 05 '19 at 12:46
1

So what are the details? How should `k` be unrolled into a sparse matrix? – user76284 Sep 25 '19 at 00:50
@SalvadorDali Shouldnt the dimensions of the resulting convolutions be (m + n - 1) x (m+n-1) rather than (n-m+1) x (n-m+1)? – calveeen Sep 07 '20 at 05:45
Am I correct that I should be able to replicate the effect of zero padding the input image by adding proper rows to the convolution matrix? Any insight into efficiently generating those rows? – milez Oct 15 '20 at 10:23

Ali Salehi · Answer 2 · 2018-11-15T20:30:54.897

43

1- Define Input and Filter

Let I be the input signal and F be the filter or kernel.

2- Calculate the final output size

If the I is m1 x n1 and F is m2 x n2 the size of the output will be:

3- Zero-pad the filter matrix

Zero pad the filter to make it the same size as the output.

4- Create Toeplitz matrix for each row of the zero-padded filter

5- Create a doubly blocked Toeplitz matrix

Now all these small Toeplitz matrices should be arranged in a big doubly blocked Toeplitz matrix.

6- Convert the input matrix to a column vector

7- Multiply doubly blocked toeplitz matrix with vectorized input signal

This multiplication gives the convolution result.

8- Last step: reshape the result to a matrix form

For more details and python code take a look at my github repository:

Step by step explanation of 2D convolution implemented as matrix multiplication using toeplitz matrices in python

edited Nov 15 '18 at 20:30

answered Aug 15 '18 at 19:52

Ali Salehi

1,003
8
19

I think there is an error. The first element of the result should be 10*0 + 20*0 + 30*0 +40*1 = 40. The element in position 2,2 should be 1*10 + 2*20 + 4*30 + 5*40 = 370. I think your result is correct for a matrix F equal to [40 30; 20 10] that is exactly F flipping both rows and columns. There is therefore an error in the procedure – Luca Jun 13 '19 at 16:10
It is doing convolution (mathematical convolution, not cross-correlation), so if you are doing it by hand, you need to flip the filter both vertically and horizontally. You can find more information on my GitHub repo. – Ali Salehi Jun 13 '19 at 16:15
This is a great explanation of 2D convolution as a matrix operation. Is there a way to represent "mode='same'" also? (i.e. keeping the output shape the same as the image)? – ajl123 Apr 14 '21 at 21:33
@ajl123 I think it should be. I'll work on it if I get time. Please feel free to dig into the code and math and send me a pull request on Github if you get the answer. – Ali Salehi Apr 14 '21 at 22:09
shouldn't the dimension of the resulting matrix decrease? – gevaraweb Apr 27 '21 at 12:53
Is there a way to rewrite the operations to make both arguments symmetric in terms of operations? Currently you convert the filter to a doubly blocked Toeplitz matrix and the image to a vector. Is there a way to make this more symmetric (since convolution is commutative after all), so both argument would undergo the exact same operations? – lightxbulb Nov 25 '21 at 12:50

Geoffrey Negiar · Answer 3 · 2017-07-22T22:30:15.850

If you unravel k to a m^2 vector and unroll X, you would then get:

a m**2 vectork
a ((n-m)**2, m**2) matrix for unrolled_X

where unrolled_X could be obtained by the following Python code:

from numpy import zeros


def unroll_matrix(X, m):
  flat_X = X.flatten()
  n = X.shape[0]
  unrolled_X = zeros(((n - m) ** 2, m**2))
  skipped = 0
  for i in range(n ** 2):
      if (i % n) < n - m and ((i / n) % n) < n - m:
          for j in range(m):
              for l in range(m):
                  unrolled_X[i - skipped, j * m + l] = flat_X[i + j * n + l]
      else:
          skipped += 1
  return unrolled_X

Unrolling X and not k allows a more compact representation (smaller matrices) than the other way around for each X - but you need to unroll each X. You could prefer unrolling k depending on what you want to do.

Here, the unrolled_X is not sparse, whereas unrolled_k would be sparse, but of size ((n-m+1)^2,n^2) as @Salvador Dali mentioned.

Unrolling k could be done like this:

from scipy.sparse import lil_matrix
from numpy import zeros
import scipy 


def unroll_kernel(kernel, n, sparse=True):

    m = kernel.shape[0]
    if sparse:
         unrolled_K = lil_matrix(((n - m)**2, n**2))
    else:
         unrolled_K = zeros(((n - m)**2, n**2))

    skipped = 0
    for i in range(n ** 2):
         if (i % n) < n - m and((i / n) % n) < n - m:
             for j in range(m):
                 for l in range(m):
                    unrolled_K[i - skipped, i + j * n + l] = kernel[j, l]
         else:
             skipped += 1
    return unrolled_K

score 1 · Answer 4 · answered Apr 09 '18 at 20:54

The code shown above doesn't produce the unrolled matrix of the right dimensions. The dimension should be (n-k+1)*(m-k+1), (k)(k). k: filter dimension, n: num rows in input matrix, m: num columns.

def unfold_matrix(X, k):
    n, m = X.shape[0:2]
    xx = zeros(((n - k + 1) * (m - k + 1), k**2))
    row_num = 0
    def make_row(x):
        return x.flatten()

    for i in range(n- k+ 1):
        for j in range(m - k + 1):
            #collect block of m*m elements and convert to row
            xx[row_num,:] = make_row(X[i:i+k, j:j+k])
            row_num = row_num + 1

    return xx

For more details, see my blog post:

http://www.telesens.co/2018/04/09/initializing-weights-for-the-convolutional-and-fully-connected-layers/