Fast 2D convolution implementation?

Question

I've made a CUDA program for 2D convolution and now want to compare it to some non-CUDA implementation to measure the speedup.

I could compare to my own implementation in plain C using the classical multiple loop approach or matlab's conv2 but it doesn't feel like a legit/fair comparison, since they're not the fastest implementations out there.

Also I was thinking of trying OpenCV and I've been looking for a SIMD optimized version with no luck. Any advice, should I go with OpenCV?

NOTE: I've read other questions, including this one, but the answer is basically the same as my plain C code or a discussion of the various methods available.

score 5 · Accepted Answer · answered Jun 03 '11 at 03:48

5

The fastest general 2D convolution algorithm is going to perform the FFT on the source first, then correlate, then FFT back to get the result (which is what conv2 does in matlab) so your multiple loop approach probably isn't the best.

The GSL is going to give you a standard, and fast implementation of the FFT if you want to use that.

Also, if the kernel is separable you may be able to do the convolution as two 1D convolutions.

OpenCV is great if that works too, it should be widely accepted as a fast implementation.

answered Jun 03 '11 at 03:48

Pace

41,875
13
113
156

Nice! I had no idea about GSL, trying it's FFT implementation sounds like a good idea. I think if I try my program against both OpenCV and GSL I'll have a pretty good comparison, since I've also tested against matlab. Thanks for you help Pace! – kirbuchi Jun 03 '11 at 16:27
Depending on kernel size, FFT is not always advantageous (just image a 1x1 kernel, clearly FFT/iFFT is a big overhead then). Besides, I fully agree, but i would rather recommend fftw library for the FFT. It's what MATLAB uses, it's the most widespread non-vendor lib (i think), and it should be restrictive GNU GPL as well iirc ;) – zerm Jun 10 '11 at 15:49
If anyone is interested I found a really fast implementation. It uses IPP so it'll only work on intel processors but maybe it helps somebody (http://www.matthewzeiler.com/software/) – kirbuchi Jul 09 '11 at 00:06

Fast 2D convolution implementation?

1 Answers1