I've computed the spectogram of an 44100Hz sampled audio signal using 0.025s long Hamming windows with 32768 point FFT(?) and here is my confusion:
- 44100*0.025 ~= 1103 sample which is << N=32768,
- yet my experience was that this high N parameter had significantly improved the resolution of the spectogram.
So my question would be what's going on??
From this awesome explanation I would conclude that the 32768 point FFT usually means that it's meant on 1 sec interval, and indeed the Voicebox's rfft function(what I used) mentions that it truncates/pads the sample to N. So I assume it padded my small 1103 vector with 0s to a 32768 long vector, to be able to compute the FFT.
Umm, is this what really happens? Can this improve the resolution although just only the first 1/32th of the signal is non-zero? (Well I think yes, but I want to be sure as this came up on thesis-defense - and I've just got this idea now, writing this post).
Thanks for any feedback.