14

//EDIT...

I'm editing my question slightly to address the issue of working specifically with non-power-of-two images. I've got a basic structure that works with square grayscale images with sizes like 256x256 or 1024x1024, but can't see how to generalize to arbitrarily sized images. The fft functions seem to want you to include the log2 of the width and height, but then its unclear how to unpack the resulting data, or if the data isn't just getting scrambled. I suppose the obvious thing to do would be to center the npot image within a larger, all black image and then ignore any values in those positions when looking at the data. But wondering if there's a less awkward way to work with npot data.

//...END EDIT

I'm having a bit of trouble with the Accelerate Framework documentation. I would normally use FFTW3, but I'm having trouble getting that to compile on an actual IOS device (see this question). Can anybody point me to a super simple implementation using Accelerate that does something like the following:

1) Turns image data into an appropriate data structure that can be passed to Accelerate's FFT methods.
In FFTW3, at its simplest, using a grayscale image, this involves placing the unsigned bytes into a "fftw_complex" array, which is simply a struct of two floats, one holding the real value and the other the imaginary (and where the imaginary is initialized to zero for each pixel).

2) Takes this data structure and performs an FFT on it.

3) Prints out the magnitude and phase.

4) Performs an IFFT on it.

5) Recreates the original image from the data resulting from the IFFT.

Although this is a very basic example, I am having trouble using the documentation from Apple's site. The SO answer by Pi here is very helpful, but I am still somewhat confused about how to use Accelerate to do this basic functionality using a grayscale (or color) 2D image.

Anyhow, any pointers or especially some simple working code that processes a 2D image would be extremely helpful!

\\\ EDIT \\\

Okay, after taking some time to dive into the documentation and some very helpful code on SO as well as on pkmital's github repo, I've got some working code that I thought I'd post since 1) it took me a while to figure it out and 2) since I have a couple of remaining questions...

Initialize FFT "plan". Assuming a square power-of-two image:

#include <Accelerate/Accelerate.h>
...
UInt32 N = log2(length*length);
UInt32 log2nr = N / 2; 
UInt32 log2nc = N / 2;
UInt32 numElements = 1 << ( log2nr + log2nc );
float SCALE = 1.0/numElements;
SInt32 rowStride = 1; 
SInt32 columnStride = 0;
FFTSetup setup = create_fftsetup(MAX(log2nr, log2nc), FFT_RADIX2);

Pass in a byte array for a square power-of-two grayscale image and turn it into a COMPLEX_SPLIT:

COMPLEX_SPLIT in_fft;
in_fft.realp = ( float* ) malloc ( numElements * sizeof ( float ) );
in_fft.imagp = ( float* ) malloc ( numElements * sizeof ( float ) );

for ( UInt32 i = 0; i < numElements; i++ ) {
    if (i < t->width * t->height) {
      in_fft.realp[i] = t->data[i] / 255.0;
      in_fft.imagp[i] = 0.0;
    }
}

Run the FFT on the transformed image data, then grab the magnitude and phase:

COMPLEX_SPLIT out_fft;
out_fft.realp = ( float* ) malloc ( numElements * sizeof ( float ) );
out_fft.imagp = ( float* ) malloc ( numElements * sizeof ( float ) );

fft2d_zop ( setup, &in_fft, rowStride, columnStride, &out_fft, rowStride, columnStride, log2nc, log2nr, FFT_FORWARD );

magnitude = (float *) malloc(numElements * sizeof(float));
phase = (float *) malloc(numElements * sizeof(float));

for (int i = 0; i < numElements; i++) {
   magnitude[i] = sqrt(out_fft.realp[i] * out_fft.realp[i] + out_fft.imagp[i] * out_fft.imagp[i]) ;
   phase[i] = atan2(out_fft.imagp[i],out_fft.realp[i]);
}

Now you can run an IFFT on the out_fft data to get the original image...

COMPLEX_SPLIT out_ifft;
out_ifft.realp = ( float* ) malloc ( numElements * sizeof ( float ) );
out_ifft.imagp = ( float* ) malloc ( numElements * sizeof ( float ) );
fft2d_zop (setup, &out_fft, rowStride, columnStride, &out_ifft, rowStride, columnStride, log2nc, log2nr, FFT_INVERSE);   

vsmul( out_ifft.realp, 1, SCALE, out_ifft.realp, 1, numElements );
vsmul( out_ifft.imagp, 1, SCALE, out_ifft.imagp, 1, numElements );

Or you can run an IFFT on the magnitude to get an autocorrelation...

COMPLEX_SPLIT in_ifft;
in_ifft.realp = ( float* ) malloc ( numElements * sizeof ( float ) );
in_ifft.imagp = ( float* ) malloc ( numElements * sizeof ( float ) );
for (int i = 0; i < numElements; i++) {
  in_ifft.realp[i] = (magnitude[i]);
  in_ifft.imagp[i] = 0.0;
}

fft2d_zop ( setup, &in_fft, rowStride, columnStride, &out_ifft, rowStride, columnStride, log2nc, log2nr, FFT_INVERSE );      

vsmul( out_ifft.realp, 1, SCALE, out_ifft.realp, 1, numElements );
vsmul( out_ifft.imagp, 1, SCALE, out_ifft.imagp, 1, numElements );

Finally, you can put the ifft results back into an image array:

for ( UInt32 i = 0; i < numElements; i++ ) {
  t->data[i] = (int) (out_ifft.realp[i] * 255.0);
}     

I haven't figured out how to use the Accelerate framework to handle non-power-of-two images. If I allocate enough memory in the setup, then I can do an FFT, followed by an IFFT to get my original image. But if try to do an autocorrelation (with the magnitude of the FFT), then my image gets wonky results. I'm not sure of the best way to pad the image appropriately, so hopefully someone has an idea of how to do this. (Or share a working version of the vDSP_conv method!)

Community
  • 1
  • 1
Angus Forbes
  • 982
  • 1
  • 9
  • 21
  • It looks like you're just doing autocorrelation here ? I thought Accelerate/vDSP already had autocorrelation functions so that you didn't have to roll your own with FFT/IFFT etc ? – Paul R May 22 '12 at 22:40
  • Hi, the vDSP_acor doesn't seem to actually exist in the Accelerate library. vDSP_conv exists, but it's giving me strange results... perhaps I'm using it incorrectly for doing an image autocorrelation. If you (or anyone) has a working example for using vDSP_conv for autocorrelation it would be great to see it. Part of the issue is just that it is confusing as to what data in expects and outputs, etc. – Angus Forbes May 23 '12 at 01:47

1 Answers1

3

I would say that in order to perform work on arbitrary image sizes, all you have to do is size your input value array appropriately to the next power of 2.

The hard part is where to put your original image data and what to fill with. What you are really trying to do to the image or data mine from the image is crucial.

In the linked PDF below, pay particular attention to the paragraph just above 12.4.2 http://www.mathcs.org/java/programs/FFT/FFTInfo/c12-4.pdf

While the above speaks about the manipulation along 2 axes, we could potentialy perform a similar idea prior to the second dimension, and following onto the second dimension. If Im correct, then this example could apply (and this is by no means an exact algorithm yet):

say we have an image that is 900 by 900: first we could split the image into vertical strips of 512, 256, 128, and 4. We would then process 4 1D FFTs for each row, one for the first 512 pixels, the next for the following 256 pixels, the next for the following 128, then the last for the remaining 4. Since the output of the FFT is essentially popularity of frequency, then these could simply be added (from the frequency ONLY perspective, not the angular offset). We could then push this same techniquie toward the 2nd dimension. At this point we would have taken into consideration every input pixel without actually having to pad.

This is really just food for thought, I have not tried this myself, and indeed should research this myself. If you are truly doing this kind of work right now, you may have more time than I at this point though.

trumpetlicks
  • 7,033
  • 2
  • 19
  • 33
  • This isn't like processing a 20 second audio signal into 5 second chunks. It's more like processing an audio signal by splitting apart 0-10khz and 10khz-20khz. You're getting different bands from each, so you'd be adding 2khz with 12khz, or something along those lines... – Glenn Maynard Jan 23 '15 at 15:31
  • This is more of a question than anything, because I'm still developing an intuition for all of this, but doesn't the separability of FFT allow for doing strips of the image 512 at a time, etc? I watched a vid of someone doing a 2D FFT by doing nested loops of 1D FFTs 1 row of pixels at a time. – God of Biscuits Mar 27 '20 at 22:44