STFT Clarification (FFT for real-time input)

Question

I get how the DFT via correlation works, and use that as a basis for understanding the results of the FFT. If I have a discrete signal that was sampled at 44.1kHz, then that means if I were to take 1s of data, I would have 44,100 samples. In order to run the FFT on that, I would have to have an array of 44,100 and a DFT with N=44,100 in order to get the resolution necessary to detect a frequencies up to 22kHz, right? (Because the FFT can only correlate the input with sinusoidal components up to a frequency of N/2)

That's obviously a lot of data points and calculation time, and I have read that this is where the Short-time FT (STFT) comes in. If I then take the first 1024 samples (~23ms) and run the FFT on that, then take an overlapping 1024 samples, I can get the continuous frequency domain of the signal every 23ms. Then how do I interpret the output? If the output of the FFT on static data is N/2 data points with fs/(N/2) bandwidth, what is the bandwidth of the STFT's frequency output?

Here's an example that I ran in Mathematica:

100Hz sine wave at 44.1kHz sample rate: enter image description here

Then I run the FFT on only the first 1024 points: enter image description here

The frequency of interest is then at data point 3, which should somehow correspond to 100Hz. I think 44100/1024 = 43 is something like a scaling factor, which means that a signal with 1Hz in this little window will then correspond to a signal of 43Hz in the full data array. However, this would give me an output of 43Hz*3 = 129Hz. Is my logic correct but not my implementation?

I think you are misunderstanding the DFT, FFT and STFT. You do not need N = 44100 to detect frequencies up to 22Khz, you get frequencies up to 22KHz with any N (i.e N can be 200, 2000 e.t.c).The variable N only affects the frequency resolution you get, not the maximum or minimum values of values of frequency. — KillaKem, May 26 '15 at 19:55
STFT is used for signals whose frequency spectrum changes over time like music. — KillaKem, May 26 '15 at 19:56
And I should say that I intend to use this for continuous input, like music or a microphone. — MrUser, May 27 '15 at 08:49
Yes @KillaKem. My question says that I would have to have N=44100 to detect a frequency of 22k. I should have said, "to detect single frequencies up to 22kHz". Thank you for clearing that up. I then realized that by applying your comment, the 29Hz error is probably because my resolution is not fine enough. When I take STFT of 4096 points I get 107.6. One could see that this would trend toward 100Hz as N increased. (I would accept your comment as the answer if you posted it.) Thanks again. — MrUser, May 27 '15 at 09:14

score 5 · Accepted Answer · answered May 28 '15 at 11:40

As I have already stated in my earlier comments, the variable N affects the resolution achievable by the output frequency spectrum and not the range of frequencies you can detect.A larger N gives you a higher resolution at the expense of higher computation time and a lower N gives you lower computation time but can cause spectral leakage, which is the effect you have seen in your last figure.

As for your other question, well, theoretically the bandwidth of an FFT is infinite but we band-limit our result to the band of frequencies in the range [-fs/2 to fs/2] because all frequencies outside that band are susceptible to aliasing and are therefore of no use.Furthermore, if the input signal is real (which is true in most cases including ours) then the frequencies from [-fs/2 to 0] are just a reflection of the frequencies from [0 to fs/2] and so some FFT procedures just output the FFT spectrum from [0 to fs/2], which I think applies to your case.This means that the N/2 data points that you received as output represent the frequencies in the range [0 to fs/2] so that is the bandwidth you are working with in the case of the FFT and also in the case of the STFT (the STFT is just a series of FFT's, each FFT in a STFT will give you a spectrum with data points in this band).

I would also like to point out that the STFT will most likely not reduce your computation time if your input is a varying signal such as music because in that case you will need to take perform it several times over the duration of the song for it to be of any use, it will however enable you to understand the frequency characteristics of your song much better that you would do if you just performed one FFT.

To visualise the results of an FFT you use frequency (and/or phase) spectrum plots but in order to visualise the results of an STFT you will most probably need to create a spectrogram which is basically a graph can is made by just basically putting the individual FFT spectrums side by side.The process of creating a spectrogram can be seen in the figure below (Source: Dan Ellis - Introduction to Speech Processing).The spectrogram will show you how your signal's frequency characteristics change over time and how you interpret it will depend on what specific features you are looking to extract/detect from the audio.You might want to look at the spectrogram wikipedia page for more information.

enter image description here

Thank you for the thorough summary and for the added bit about the spectrogram. — MrUser, May 28 '15 at 15:05

STFT Clarification (FFT for real-time input)

1 Answers1

Linked