12

I know generally speaking FFT and multiplication is usually faster than direct convolve operation, when the array is relatively large. However, I'm convolving a very long signal (say 10 million points) with a very short response (say 1 thousand points). In this case the fftconvolve doesn't seem to make much sense, since it forces a FFT of the second array to the same size of the first array. Is it faster to just do direct convolve in this case?

LWZ
  • 11,670
  • 22
  • 61
  • 79
  • 2
    Is there a reason you can't just time both approaches, eg with `timeit`? – Danica Feb 22 '13 at 06:57
  • 2
    I didn't know this function. I'll try. I also would like to know underlying theory though. – LWZ Feb 22 '13 at 07:22
  • Note: As of v0.19, convolve automatically chooses fftconvolve or the direct method based on an estimation of which is faster. – Dr Xorile Feb 28 '20 at 00:08

3 Answers3

6

Take a look at the comparison I did here:

http://scipy-cookbook.readthedocs.io/items/ApplyFIRFilter.html

Your case might be near the transition between using a plain convolution and using the FFT-based convolution, so your best bet (as suggested by @Dougal in a comment) is to time it yourself.

(Note that I didn't do overlap-add or overlap-save in that comparison.)

Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214
5

thank you for your help. Now I did the test myself, I did convolution with 2 arrays, size of 2^20 and 2^4, and this is the result:

numpy.convolve: 110 ms
scipy.signal.convolve: 1.0 s
scipy.signal.fftconvolve: 2.5 s

So we have a winner, numpy convolve is is much faster than the others. I still don't know why though.


Now I tried 2 longer arrays, size of 2^22 and 2^10. The result is:

numpy.convolve: 6.7 s
scipy.signal.convolve: 221 s
scipy.signal.fftconvolve: MemoryError

The difference just gets bigger.

LWZ
  • 11,670
  • 22
  • 61
  • 79
  • Thanks for the feedback. Maybe [related](https://github.com/scipy/scipy/issues/8154). – mins Apr 11 '21 at 16:08
4

FFT fast convolution via the overlap-add or overlap save algorithms can be done in limited memory by using an FFT that is only a small multiple (such as 2X) larger than the impulse response. It breaks the long FFT up into properly overlapped shorter but zero-padded FFTs.

Even with the overlap overhead, O(NlogN) will beat M*N in efficiency for large enough N and M.

hotpaw2
  • 70,107
  • 14
  • 90
  • 153
  • Thanks for your answer! Do you mean even with the `fftconvolve` function, it will automatically break down long FFT into short FFTs and I do not need to worry about it? – LWZ Feb 22 '13 at 08:56
  • 1
    @LWZ: scipy's fftconvolve does not do that, no. hotpaw, do you have a reference/implementation for that method? – endolith Aug 13 '13 at 18:06