33

I want to generate an audio spectrum (as seen in this video) of a mp3 audio file. Basically this problem requires calculating the fft of the audio signal. How do I program this in C/C++?

I've looked at a couple of open source libraries such as FFTW and I really don't know how to use these for my problem. Any help would be greatly appreciated. Thanks in advance!

too honest for this site
  • 12,050
  • 4
  • 30
  • 52
MRashid
  • 472
  • 1
  • 5
  • 13
  • 1
    Do you already know C or C++? If not, it's probably best to start off with something simpler... – Oliver Charlesworth Jan 13 '11 at 00:19
  • Check out http://stackoverflow.com/questions/604453/analyze-audio-using-fast-fourier-transform – Mark Ransom Jan 13 '11 at 04:53
  • 2
    Do you know anything about DSP? FFTW is a fantastic tool, but unless you anything about Fourier transforms/windowing/resolution bins/etc. it will be very difficult to produce anything. – cmannett85 Jan 13 '11 at 08:14
  • 2
    FFT is the easy part (and definitely not the last word) of power spectrum density estimation. There are many other considerations, especially windowing. Google `Slepian window` for robust techniques which minimize power leakage. – Alexandre C. Jan 13 '11 at 11:46
  • Please [edit] your question to show [the code you have so far](http://whathaveyoutried.com). You should include at least an outline (but preferably a [mcve]) of the code that you are having problems with, then we can try to help with the specific problem. You should also read [ask]. – Toby Speight Jun 20 '17 at 17:23

1 Answers1

57

There are quite a few similar/related questions on SO already which are well worth reading as the answers contain a lot of useful information and advice, but in essence you need to do this:

  • Convert the audio data to the format required by FFT (e.g. int -> float, with separate L/R channels);
  • Apply suitable window function (e.g. Hann aka Hanning window)
  • Apply FFT (NB: if using typical complex-to-complex FFT then set all imaginary parts in the input array to zero);
  • Calculate the magnitude of the first N/2 FFT output bins (sqrt(re*re + im*im));
  • Optionally convert magnitude to dB (log) scale (20 * log10(magnitude) or 10 * log10(re*re + im*im));
  • Plot N/2 (log) magnitude values.

Note that while FFTW is a very good and very fast FFT it may be a little overwhelming for a beginner - it's also very expensive if you want to include it as part of a commercial product. I recommend starting with KissFFT instead.

tleb
  • 4,395
  • 3
  • 25
  • 33
Paul R
  • 208,748
  • 37
  • 389
  • 560
  • 3
    +1 - The only thing I'd add is a first step to separate the left (or right, doesn't matter) channel out from the audio file. And another +1 if I could for using KissFFT before mucking with FFTW. – mtrw Jan 13 '11 at 11:40
  • @mtrw: thanks for the comments - added note re separating L/R channels to the first step – Paul R Jan 13 '11 at 11:43
  • 15
    I would only add that you can simplify if you're doing a log scale - instead of calculating magnitude (with sqrt) and then scaling `20*log10`, take the square of the magnitude (skipping the sqrt) and then scale `10*log10`. Mathematically equivalent, but saves an unnecessary `sqrt` call. – Mark Ransom Jan 14 '11 at 04:09
  • @Mark: yes, good point, you can go straight to dB without the sqrt if you don't need the linear magnitude. – Paul R Jan 14 '11 at 08:24
  • @PaulR What is separate L/R channels? I am a novice as far as FFTs go and I have started reading about this topic so am just curios? – SayeedHussain Aug 01 '13 at 08:22
  • It's just the left/right stereo channels that you get in most sound files or audio inputs - you need to process left and right channels separately. – Paul R Aug 02 '13 at 16:03
  • Why is the magnitude of the frequency components being converted to dB using the Field quantity convention instead of as a Power quantity? I thought that acoustic intensity was measured as a power? That is, why 20 * log10(magnitude) instead of 10 * log10(magnitude)? – A. Levy Jul 10 '15 at 07:11
  • @A.Levy: well this is how I think of it: input signals are voltages, power is proportional to V^2, so to get power in dB it's 20*log10 rather than 10*log10. – Paul R Jul 10 '15 at 07:17