0

The aim is to do correlation/convolutions(flip) of two 2D arrays using ios Accelerate framework for gaining speed.

  1. My first attempt was with vImageConvolve_PlanarF/vdsp_imgfir which was good for lower sized arrays. But as array size increased, performance dropped drastically as it was an O(n2) implementation as mentioned by Accelerate developers themselves here(1).

  2. I moved to FFT implementation(2) for reducing complexity to O(nlog2n). Using vDSP_fft2d_zip, gained speed to an extent. But using vDSP_fft2d_zip on 2D arrays at non powers of 2, we need to pad zeros. For e.g. on a 2D array of size 640 * 480, we need to pad zeros to make it 1024 * 512. Other FFT implementations like FFTW or OpenCV's DFT allow sizes which could be expressed as size = 2p * 3p * 5r. That allows, FFTW/OpenCV to do FFT of 640 * 480 2D array at the same size.

So for 2D arrays at size 640*480, in an Accelerate vs FFTW/OpenCV comparison, it is effectively between 1024*512 and 640*480 dimensions. So what ever performance gains I get from Accelerate's FFT on 2D arrays is nullified by it's inability to performs FFT at reasonable dimensions like size = 2p * 3p * 5r

2 Queries.

  1. Am I missing any Accelerate functionality to perform this easily ? For e.g. any Accelerate function which could perform 2D array FFT at size = 2p * 3p * 5r. I assume vDSP_DFT_Execute performs only 1D FFT.
  2. Better approaches to 2D FFT or correlation. Like in this answer(3), which asks to split arrays like 480 = 256 + 128 + 64 + 32 with repeated 1D FFT's over rows and then over columns. But this will need too many functions calls and I assume, will not help with performance.

Of lesser importance: I am doing correlation in tiles as one of the 2D arrays is far bigger then another. Say like 1920*1024 vs 150*100.

Community
  • 1
  • 1
kiranpradeep
  • 10,859
  • 4
  • 50
  • 82
  • 1
    You might also want to try asking on the `PerfOptimization-dev` list on http://lists.apple.com - some of Apple's Accelerate developers hang out there and they can be very helpful. – Paul R Apr 22 '15 at 11:59

1 Answers1

0

Linear convolution or correlation requires zero padding anyway, otherwise the result will be circular convolution or correlation.

1d iOS vDSP/Accelerate FFTs do allow N to be the product of small primes, not just 2^M. Not sure about 2d, but one can build a 2d FFT out of a 1d FFT.

hotpaw2
  • 70,107
  • 14
  • 90
  • 153
  • Yes. `vDSP_DFT_Execute` helps for 1D at non power of 2 ( or even some powers of 2 ). But I am afraid if, for a 640 * 480 2d array, the number of 1D FFT function calls + build up code, will take away speed gains by Accelerate. – kiranpradeep Apr 22 '15 at 12:59