5

I've seen the various FFT questions on here but I'm confused on part of the implementation. Instead of performing the FFT in real time, I want to do it offline. Lets say I have the raw data in float[] audio. The sampling rate is 44100 and so audio[0] to audio[44099] will contain 1 seconds worth of audio. If my FFT function handles the windowing (e.g. Hanning), do I simply put the entire audio buffer into the function in one go? Or, do I have to cut the audio into chunks of 4096 (my window size) and then input that into the FFT which will then perform the windowing function on top?

Skoder
  • 3,983
  • 11
  • 46
  • 73
  • 1
    Surely this depends on the specifics of the FFT library that you are using. – Mankarse Sep 21 '11 at 04:27
  • This question is a better fit for dsp.stackexchange.com . Should it be moved? – Mark Borgerding Sep 21 '11 at 11:46
  • @Mankarse - Yes, sorry, I should've been more specific. I had three different FFT libs and wasn't sure which one I was going to use. I've decided to use the one in Apple's Accelerate framework. – Skoder Sep 21 '11 at 14:56

3 Answers3

2

The choice of whether to calculate one FFT over the entire data set (in the OP's case, 44100 samples representing 1-second of data), or whether to do a series of FFT's over smaller subsets of the full data set, depends on the data, and on the intended purpose of the FFT.

If the data is relatively static spectrally over the full data set, then one FFT over the entire data set is probably all that's needed.

However, if the data is spectrally dynamic over the data set, then multiple sliding FFT's over small subsets of the data would create a more accurate time-frequency representation of the data.

The plot below shows the power spectrum of an acoustic guitar playing an A4 note. The audio signal was sampled at 44.1 KHz and the data set contains 131072 samples, almost 3 seconds of data. This data set was pre-multiplied with a Hann window function.

Guitar spectrum, Hann window, 131072 samples

The plot below shows the power spectrum of a subset of 16384 samples (0 to 16383) taken from the full data set of the acoustic guitar A4 note. This subset was also pre-multiplied with a Hann window function.

Guitar spectrum, Hann window, 16384 samples

Notice how the spectral energy distribution of the subset is significantly different from the spectral energy distribution of the full data set.

If we were to extract subsets from the full data set, using a sliding 16384 sample frame, and calculate the power spectrum of each frame, we would create an accurate time-frequency picture of the full data set.

References:

Real audio signal data, Hann window function, plots, FFT, and spectral analysis were done here:

Fast Fourier Transform, spectral analysis, Hann window function, audio data

Babson
  • 909
  • 1
  • 7
  • 7
2

You may need to copy your input data to a separate buffer and get it in the correct format, e.g. if your FFT is in-place, or if it requires interleaved complex data (real/imaginary). However if your FFT routine can take a purely real input and is not in-place (i.e. non-destructive) then you may just be able to pass a pointer to the original sample data, along with an appropriate size parameter.

Typically for 1s of audio, e.g. speech or music, you would pick an FFT size which corresponds to a reasonably stationary chunk of audio, e.g. 10 ms or 20 ms. So at 44.1 kHz your FFT size might be say 512 or 1024. You would then generate successive spectra by advancing through your buffer and doing a new FFT at each starting point. Note that it's common practice to overlap these successive buffers, typically by 50%. So if N = 1024 your first FFT would be for samples 0..1023, your second would be for samples 512..1535, then 1024..2047, etc.

Paul R
  • 208,748
  • 37
  • 389
  • 560
  • Thanks for the help Paul. I'm going to be using Apple's Accelerate framework, which I believe performs an in-place FFT so I think I need to manipulate the data in a separate buffer. So if I have a song that is 180 seconds long, I simply iterate over the buffer as described and perform the FFT up to N=7938000 (44100 * 180)? – Skoder Sep 21 '11 at 15:11
  • 2
    If the function name ends in "ip" then it's in-place, e.g. `vDSP_fft_zrip`. So yes, copy each chunk of input data to a suitable FFT buffer, apply a window function (e.g. Hann), apply FFT, compute power spectrum or whatever you want to do, store and/or display power spectrum for this chunk, move on to next chunk... – Paul R Sep 21 '11 at 15:43
  • 1
    See also: http://stackoverflow.com/questions/3398753/using-the-apple-fft-and-accelerate-framework – Paul R Sep 21 '11 at 15:47
1

The chunk size or window length you pick controls the frequency resolution and the time resolution of the FFT result. You have to determine which you want or what trade-off to make.

Longer windows give you better frequency resolution, but worse time resolution. Shorter windows, vice versa. Each FFT result bin will contain a frequency bandwidth of roughly 1 to 2 times the sample rate divided by the FFT length, depending on the window shape (rectangular, von Hann, etc.), not just one single frequency. If your entire data chunk is stationary (frequency content doesn't change), then you may not need any time resolution, and can go for 1 to 2 Hz frequency "resolution" in your 1 second of data. Averaging multiple short FFT windows might also help reduce the variance of your spectral estimations.

hotpaw2
  • 70,107
  • 14
  • 90
  • 153
  • Not sure if you're familiar with it, but I'm trying to create something similar to AudioSurf (this video shows a good example of tempo recognition about 20 seconds in http://www.youtube.com/watch?v=2EsVyEnhxWY). AudioSurf preprocesses the audio instead of doing it in real time and I wasn't quite sure how this was done. – XSL Sep 21 '11 at 21:46
  • @SSL - Why are you looking at an FFT of only 1 second of data for tempo recognition? Or if this is about something beyond the OPs original question, perhaps you should ask your own new question on SO. – hotpaw2 Sep 22 '11 at 18:32
  • It was beyond the original question, but I was curious about the same thing as OP. I don't need the info right now, so I'll create a new thread when I cross that bridge. – XSL Sep 22 '11 at 19:22