Fast 2D convolution for DSP

Question

I want to implement some image-processing algorithms which are intended to run on a beagleboard. These algorithms use convolutions extensively. I'm trying to find a good C implementation for 2D convolution (probably using the Fast Fourier Transform). I also want the algorithm to be able to run on the beagleboard's DSP, because I've heard that the DSP is optimized for these kinds of operations (with its multiply-accumulate instruction).

I have no background in the field so I think it won't be a good idea to implement the convolution myself (I probably won't do it as good as someone who understands all the math behind it). I believe a good C convolution implementation for DSP exists somewhere but I wasn't able find it?

Could someone help?

EDIT: Turns out the kernel is pretty small. Its dimensions are either 2X2 or 3X3. So I guess I'm not looking for an FFT-based implementation. I was searching for convolution on the web to see its definition so I can implement it in a straight forward way (I don't really know what convolution is). All I've found is something with multiplied integrals and I have no idea how to do it with matrices. Could somebody give me a piece of code (or pseudo code) for the 2X2 kernel case?

http://en.wikipedia.org/wiki/Convolution#Discrete_convolution — Peter G., Oct 28 '10 at 11:51
@Peter: Thanks, but they're talking about the 1D case in there, nothing about 2D convolution — snakile, Oct 28 '10 at 12:05
2d convolutions (in image-processing) are often separable, so can be run as 2 1-d convolutions in sequence. This makes the processing requirement much smaller. Can you give examples of the kinds of kernels you want to use? — Martin Thompson, Mar 20 '13 at 12:46

Paul R · Accepted Answer · 2010-10-28T12:53:29.227

15

What are the dimensions of the image and the kernel ? If the kernel is large then you can use FFT-based convolution, otherwise for small kernels just use direct convolution.

The DSP might not be the best way to do this though - just because it has a MAC instruction doesn't mean that it will be more efficient. Does the ARM CPU on the Beagle Board have NEON SIMD ? If so then that might be the way to go (and more fun too).

For a small kernel, you can do direct convolution like this:

// in, out are m x n images (integer data)
// K is the kernel size (KxK) - currently needs to be an odd number, e.g. 3
// coeffs[K][K] is a 2D array of integer coefficients
// scale is a scaling factor to normalise the filter gain

for (i = K / 2; i < m - K / 2; ++i) // iterate through image
{
  for (j = K / 2; j < n - K / 2; ++j)
  {
    int sum = 0; // sum will be the sum of input data * coeff terms

    for (ii = - K / 2; ii <= K / 2; ++ii) // iterate over kernel
    {
      for (jj = - K / 2; jj <= K / 2; ++jj)
      {
        int data = in[i + ii][j +jj];
        int coeff = coeffs[ii + K / 2][jj + K / 2];

        sum += data * coeff;
      }
    }
    out[i][j] = sum / scale; // scale sum of convolution products and store in output
  }
}

You can modify this to support even values of K - it just takes a little care with the upper/lower limits on the two inner loops.

edited Oct 28 '10 at 12:53

answered Oct 20 '10 at 21:42

Paul R

208,748
37
389
560

@Paul: I'd imagine that for straight-up convolutions, the TI would beat the ARM, given modulo addressing modes and zero-overhead loops and so on. – Oliver Charlesworth Oct 20 '10 at 21:45
@Oli: you may be right - I'm not sure that modulo addressing helps, but there may be other advantages. – Paul R Oct 20 '10 at 21:48
@MPaul: Modulo addressing helps if you're zipping through a circular buffer. I have no numbers to back this up, but if the TI can't beat the host processor at the canonical thing it's designed for, then it's a pretty pointless combination! – Oliver Charlesworth Oct 20 '10 at 21:50
1

@Oli: it would certainly be interesting to try a routine such as this on both the ARM and the DSP and see which comes out faster. One advantage of doing it on the DSP of course is that it leaves the ARM free for other tasks. I guess the pragmatic solution though would be to see which, if any, has existing library code for this that you might be able to leverage for this task and then go with that. Most DSPs have 1D and 2D signal processing libraries so that would probably be the first thing to look at. – Paul R Oct 20 '10 at 21:59
1

@Paul: Good point with "doing it on the DSP leaves the ARM free". – Oliver Charlesworth Oct 20 '10 at 22:09
@Paul: Thanks for the things you said. Do you know where can I find these DSP signal processing libraries you mentioned? – snakile Oct 20 '10 at 22:19
@Paul: I can't give an exact answer for the dimensions question. Th algorithms I plan to implement come from an academic paper which doesn't say anything about the image dimentions. But I guess I'll want it to operate on 512X512 images. – snakile Oct 20 '10 at 22:22
@snakikle: What's more important is the kernel, i.e. the thing you'll be convolving the image with. As Paul says, if this is "small", then it's more efficient to use standard convolution. – Oliver Charlesworth Oct 20 '10 at 22:33
@snakile: I believe the DSP on the Beagleboard is from the TI C64X family, so you might want to look at the TI web site or any other TI DSP-related sites to see what ready-rolled routines are available. You'll need a C64X toolchain to build this code of course (see http://elinux.org/BeagleBoard/DSP_Howto), and it's probably going to be a lot harder to test/debug than regular ARM CPU code, but that's half the fun of course. – Paul R Oct 20 '10 at 22:43
@Oli, @Paul: Thank you both. Your remarks were helpful. Now it turns out the kernel is small, so as Paul said it doesn't make sense to use FFT. I edited my question. Could you please help me a bit more? – snakile Oct 28 '10 at 11:40
@snakile: no problem - I've edited my answer to show how to implement a convolution with a KxK kernel with integer data and coefficients. – Paul R Oct 28 '10 at 12:56
I have implemented something similar to your algorithm in JavaScript. You can check my blog post for details: http://ec2-54-232-84-48.sa-east-1.compute.amazonaws.com/two-dimensional-convolution-algorithm-with-arrays-in-javascript/ – Renato Gama Mar 19 '13 at 22:06

score 1 · Answer 2 · answered Mar 19 '13 at 22:09

I know it might be off topic but due to the similarity between C and JavaScript I believe it could still be helpful. PS.: Inspired by @Paul R answer.

Two dimensions 2D convolution algorithm in JavaScript using arrays

function newArray(size){
    var result = new Array(size);
    for (var i = 0; i < size; i++) {
        result[i] = new Array(size);
    }

    return result;
}

function convolveArrays(filter, image){
    var result = newArray(image.length - filter.length + 1);

    for (var i = 0; i < image.length; i++) {
        var imageRow = image[i];
        for (var j = 0; j <= imageRow.length; j++) {

            var sum = 0;
            for (var w = 0; w < filter.length; w++) {
                if(image.length - i < filter.length) break;

                var filterRow = filter[w];
                for (var z = 0; z < filter.length; z++) {
                    if(imageRow.length - j < filterRow.length) break;
                    sum += image[w + i][z + j] * filter[w][z];
                }
            }

            if(i < result.length && j < result.length)
                result[i][j] = sum;
        }   
    }

    return result;
}

You can check the full blog post at http://ec2-54-232-84-48.sa-east-1.compute.amazonaws.com/two-dimensional-convolution-algorithm-with-arrays-in-javascript/

I beleive this should be the correct answer as the accepted answer provides too many additional assumptions. E.g coeficient multiplacation. Which could easily be outside and often better to do after the kernel application. — Jamie Nicholl-Shelley, Oct 28 '21 at 06:03

Fast 2D convolution for DSP

2 Answers2

Linked