FFT + Short Term Window : Confusion

Question

I've computed the spectogram of an 44100Hz sampled audio signal using 0.025s long Hamming windows with 32768 point FFT(?) and here is my confusion:

44100*0.025 ~= 1103 sample which is << N=32768,
yet my experience was that this high N parameter had significantly improved the resolution of the spectogram.

So my question would be what's going on??

From this awesome explanation I would conclude that the 32768 point FFT usually means that it's meant on 1 sec interval, and indeed the Voicebox's rfft function(what I used) mentions that it truncates/pads the sample to N. So I assume it padded my small 1103 vector with 0s to a 32768 long vector, to be able to compute the FFT.

Umm, is this what really happens? Can this improve the resolution although just only the first 1/32th of the signal is non-zero? (Well I think yes, but I want to be sure as this came up on thesis-defense - and I've just got this idea now, writing this post).

Thanks for any feedback.

score 9 · Accepted Answer · answered Jun 28 '11 at 22:03

Zero-padding in the time-domain is equivalent to interpolation in the frequency domain (and vice versa). So you've improved the resolution in the sense that this allows you to draw a smoother curve in between the points. But you haven't increased the information content; any processing that you do on the interpolated FFT output will be possible on the non-interpolated FFT output.

score 2 · Answer 2 · answered Jun 29 '11 at 04:40

As Oli pointed out, zero-padding an FFT is a method of interpolation. More specifically, the interpolation kernel is the transform of the window you used. So, at some point, your improvement in "resolution" is more related to the shape and width of your chosen window than to the spectral content of your data.

FFT + Short Term Window : Confusion

2 Answers2