How to generate the audio spectrum using fft in C++?

Question

I want to generate an audio spectrum (as seen in this video) of a mp3 audio file. Basically this problem requires calculating the fft of the audio signal. How do I program this in C/C++?

I've looked at a couple of open source libraries such as FFTW and I really don't know how to use these for my problem. Any help would be greatly appreciated. Thanks in advance!

Do you already know C or C++? If not, it's probably best to start off with something simpler... — Oliver Charlesworth, Jan 13 '11 at 00:19
Check out http://stackoverflow.com/questions/604453/analyze-audio-using-fast-fourier-transform — Mark Ransom, Jan 13 '11 at 04:53
Do you know anything about DSP? FFTW is a fantastic tool, but unless you anything about Fourier transforms/windowing/resolution bins/etc. it will be very difficult to produce anything. — cmannett85, Jan 13 '11 at 08:14
FFT is the easy part (and definitely not the last word) of power spectrum density estimation. There are many other considerations, especially windowing. Google `Slepian window` for robust techniques which minimize power leakage. — Alexandre C., Jan 13 '11 at 11:46
Please [edit] your question to show [the code you have so far](http://whathaveyoutried.com). You should include at least an outline (but preferably a [mcve]) of the code that you are having problems with, then we can try to help with the specific problem. You should also read [ask]. — Toby Speight, Jun 20 '17 at 17:23

score 57 · Accepted Answer · edited Mar 23 '22 at 18:21

57

There are quite a few similar/related questions on SO already which are well worth reading as the answers contain a lot of useful information and advice, but in essence you need to do this:

Convert the audio data to the format required by FFT (e.g. int -> float, with separate L/R channels);
Apply suitable window function (e.g. Hann aka Hanning window)
Apply FFT (NB: if using typical complex-to-complex FFT then set all imaginary parts in the input array to zero);
Calculate the magnitude of the first N/2 FFT output bins (sqrt(re*re + im*im));
Optionally convert magnitude to dB (log) scale (20 * log10(magnitude) or 10 * log10(re*re + im*im));
Plot N/2 (log) magnitude values.

Note that while FFTW is a very good and very fast FFT it may be a little overwhelming for a beginner - it's also very expensive if you want to include it as part of a commercial product. I recommend starting with KissFFT instead.

edited Mar 23 '22 at 18:21

tleb

4,395
3
25
33

answered Jan 13 '11 at 09:06

Paul R

208,748
37
389
560

3

+1 - The only thing I'd add is a first step to separate the left (or right, doesn't matter) channel out from the audio file. And another +1 if I could for using KissFFT before mucking with FFTW. – mtrw Jan 13 '11 at 11:40
@mtrw: thanks for the comments - added note re separating L/R channels to the first step – Paul R Jan 13 '11 at 11:43
15

I would only add that you can simplify if you're doing a log scale - instead of calculating magnitude (with sqrt) and then scaling `20*log10`, take the square of the magnitude (skipping the sqrt) and then scale `10*log10`. Mathematically equivalent, but saves an unnecessary `sqrt` call. – Mark Ransom Jan 14 '11 at 04:09
@Mark: yes, good point, you can go straight to dB without the sqrt if you don't need the linear magnitude. – Paul R Jan 14 '11 at 08:24
@PaulR What is separate L/R channels? I am a novice as far as FFTs go and I have started reading about this topic so am just curios? – SayeedHussain Aug 01 '13 at 08:22
It's just the left/right stereo channels that you get in most sound files or audio inputs - you need to process left and right channels separately. – Paul R Aug 02 '13 at 16:03
Why is the magnitude of the frequency components being converted to dB using the Field quantity convention instead of as a Power quantity? I thought that acoustic intensity was measured as a power? That is, why 20 * log10(magnitude) instead of 10 * log10(magnitude)? – A. Levy Jul 10 '15 at 07:11
@A.Levy: well this is how I think of it: input signals are voltages, power is proportional to V^2, so to get power in dB it's 20*log10 rather than 10*log10. – Paul R Jul 10 '15 at 07:17

How to generate the audio spectrum using fft in C++?

1 Answers1

Linked

Related