How and why is FFT convolution faster than direct convolution?

Question

I read about convolutions being faster when computed into the frequency domain because it's "just" a matrix multiplication (in 2D), while in the time domain it's a lot of small matrix multiplication.

So I made this code we can see that FFT convolution is more complex than "normal" convolution. It's clear that something is wrong in my assumptions.

What is wrong ?

from sympy import exp, log, symbols, init_printing, lambdify
init_printing(use_latex='matplotlib')

import numpy as np
import matplotlib.pyplot as plt

def _complex_mult(n):
  """Complexity of a MatMul of a 2 matrices of size (n, n)"""
  # see https://en.wikipedia.org/wiki/Matrix_multiplication_algorithm
  return n**2.5

def _complex_fft(n):
  """Complexity of fft and ifft"""
  # see https://en.wikipedia.org/wiki/Fast_Fourier_transform
  return n*log(n)

def fft_mult_fft(n, m):
  """Complexity of a convolution in the freq space.
  fft -> mult between M and kernel -> ifft
  """
  return _complex_fft(n) * 2 + _complex_mult(n)

def conv(n, m):
  """Complexity of a convolution in the time space.
  for every n of M, we execute a MatMul of 2 (m, m) matrices
  """
  return n*_complex_mult(m)


n = symbols('n') # size of M = (n, n)
m = symbols('m') # size of kernel = (m, m)

M = np.linspace(1, 1e3+1, 1e1)
kernel_size = np.linspace(2, 7, 7-2+1)**2

fft = fft_mult_fft(n, m)
discrete = conv(n, m)

f1 = lambdify(n, fft, 'numpy')
f2 = lambdify([n, m], discrete, 'numpy')

fig, ax = plt.subplots(1, len(kernel_size), figsize=(30, 10))

f1_computed = f1(M) # independant wrt m, do not compute it at each time

for i, size in enumerate(kernel_size):
  ax[i].plot(M, f1_computed, c='red', label='freq domain (fft)')
  ax[i].plot(M, f2(M, size), c='blue', label='time domain (normal)')
  ax[i].legend(loc='upper left')
  ax[i].set_title("kernel size = {}".format(size))
  ax[i].set_xlabel("Matrix size")
  ax[i].set_ylabel("Complexity")

And here is the output: (click to zoom)

Really interesting question! I don't know the full answer, but just looking at the complexity of a 2D FFT here: https://stackoverflow.com/questions/6514861/computational-complexity-of-the-fft-in-n-dimensions, shouldn't `_complex_fft` be n^2 log(n)? — user545424, Jul 23 '19 at 16:16
Also, in `_conv` you have to multiply an m x m matrix n^2 times I think, since it's 2D. — user545424, Jul 23 '19 at 16:24
As written in the code `n = symbols('n') # size of M = (n, n)` ; kernel size is m where m = row*col (for square filter, it's why I tried with power of 2) — politinsa, Jul 23 '19 at 20:25

score 4 · Answer 1 · answered Jul 23 '19 at 17:53

You are experiencing two well-known facts:

for small kernel sizes, the spatial approach is faster,
for large kernel sizes, the frequency approach can be faster.

Your kernels and images are relatively too small to observe the benefits of the FFT.

score 1 · Accepted Answer · answered Jul 23 '19 at 16:45

As @user545424 pointed out, the problem was that I was computing n*complexity(MatMul(kernel)) instead of n²*complexity(MatMul(kernel)) for a "normal" convolution.

I finally get this: (where n is the size of the input and m the size of the kernel)

Here is the final code and the new charts.

from sympy import exp, log, symbols, init_printing, lambdify
init_printing(use_latex='matplotlib')

import numpy as np
import matplotlib.pyplot as plt

def _complex_mult(n):
  """Complexity of a MatMul of a 2 matrices of size (n, n)"""
  # see https://en.wikipedia.org/wiki/Matrix_multiplication_algorithm
  return n**2.5

def _complex_fft(n):
  """Complexity of fft and ifft"""
  # see https://stackoverflow.com/questions/6514861/computational-complexity-of-the-fft-in-n-dimensions#comment37078975_6516856
  return 4*(n**2)*log(n)

def fft_mult_fft(n, m):
  """Complexity of a convolution in the freq space.
  fft -> mult between M and kernel -> ifft
  """
  return _complex_fft(n) * 2 + _complex_mult(n)

def conv(n, m):
  """Complexity of a convolution in the time space.
  for every n*n cell of M, we execute a MatMul of 2 (m, m) matrices
  """
  return n*n*_complex_mult(m)


n = symbols('n') # size of M = (n, n)
m = symbols('m') # size of kernel = (m, m)

M = np.linspace(1, 1e3+1, 1e1)
kernel_size = np.linspace(2, 7, 7-2+1)**2

fft_symb = fft_mult_fft(n, m)
discrete_symb = conv(n, m)

fft_func = lambdify(n, fft_symb, 'numpy')
dicrete_func = lambdify([n, m], discrete_symb, 'numpy')


fig, ax = plt.subplots(1, len(kernel_size), figsize=(30, 10))
fig.patch.set_facecolor('grey')

for i, size in enumerate(kernel_size):
  ax[i].plot(M, fft_func(M), c='red', label='freq domain (fft)')
  ax[i].plot(M, dicrete_func(M, size), c='blue', label='time domain (normal)')
  ax[i].legend(loc='upper left')
  ax[i].set_title("kernel size = {}".format(size))
  ax[i].set_xlabel("Matrix size")
  ax[i].set_ylabel("Complexity")

How and why is FFT convolution faster than direct convolution?

What is wrong ?

2 Answers2