0

I would like to see how certain frequencies, specifically low bass at 20 - 60hz are present in a piece of audio. I have the audio as a byte array, I convert it to array of shorts, then into a complex number by (short[i]/(double)short.MaxValue, 0). Then i pass this to the FFT from Aforge.

The audio is mono and sample rate of 44100. I understand I can only put chucks through the FFT at ^2. So 4096 for example. I don't understand what frequencies be in the output bins.

if I am taking 4096 samples from the audio that is at 44100 sample rate. Does this mean I am taking milliseconds worth of audio? or only getting some of the frequencies that will be present?

I add the output of the FFT to a array, my understanding is that as I am taking 4096 then bin 0 would contain 0*44100/4096 = 0hz, bin 1 would hold 1*44100/4096 = 10.7666015625hz and so on. Is this correct? or im I doing something fundamentally wrong here?

My goal would be to average the frequencies between say 20 - 60 hz, so for a song with very low, heavy bass then this number would be higher than say a soft piano piece with very little bass.

Here is my code.

OpenFileDialog file = new OpenFileDialog();
file.ShowDialog();
WaveFileReader reader = new WaveFileReader(file.FileName);

byte[] data = new byte[reader.Length];
reader.Read(data, 0, data.Length);

samepleRate = reader.WaveFormat.SampleRate;
bitDepth = reader.WaveFormat.BitsPerSample;
channels = reader.WaveFormat.Channels;

Console.WriteLine("audio has " + channels + " channels, a sample rate of " + samepleRate + " and bitdepth of " + bitDepth + ".");


short[] shorts = data.Select(b => (short)b).ToArray();

int size = 4096;
int window = 44100 * 10;
int y = 0;
Complex[] complexData = new Complex[size];
for (int i = window; i < window + size; i++) 
{
    Complex tmp = new Complex(shorts[i]/(double)short.MaxValue, 0);

    complexData[y] = tmp;
    y++;

}




FourierTransform.FFT(complexData, FourierTransform.Direction.Forward);


double[] arr = new double[complexData.Length];
//print out sample of conversion
for (int i = 0; i < complexData.Length; i++)
{
    arr[i] = complexData[i].Magnitude;

}

Console.Write("complete, ");


return arr; 

edit : changed to FFT fro DFT

knight
  • 65
  • 2
  • 8
Pete B
  • 69
  • 2
  • 13
  • 1
    Well you seem to be doing a DFT (which is even more precise than FFT) but how the returned data is structured, i don't know. Should be in the documentation of the library you are using. Fundamentally you'd be right if the data is structured lineary, but it could also be structured logarithmicly. – MrPaulch Nov 20 '14 at 18:30
  • Thanks for pointing that out, I did mean to run fft, Just copied the code over when I was playing with the DFT. – Pete B Nov 20 '14 at 18:47
  • You're basically on the right track - your bins are around 10 Hz wide as you calculated - see [this answer](http://stackoverflow.com/questions/4364823/how-to-get-frequency-from-fft-result/4371627#4371627) for a fuller explanation. – Paul R Nov 20 '14 at 20:07
  • 1
    There is no difference between FFT and DFT as far as accuracy is concerned - the FFT is just a much more efficient implementation of the DFT, but mathematically they are equivalent. – Paul R Nov 20 '14 at 20:35
  • Sorry, most FFT algorithms are exactly as precise as DFT. To defend where i'm coming from: There are some FFT algorithms that trade off even more temporal efficiency for some accuracy. – MrPaulch Nov 20 '14 at 21:51
  • @MrPaulch: oh, OK - I guess you'e talking about fixed point FFTs where you tend to lose some precision on each butterfly due to limited dynamic range ? – Paul R Nov 20 '14 at 22:13

1 Answers1

1

Here's a modified version of your code. Note the comments starting with "***".

OpenFileDialog file = new OpenFileDialog();
file.ShowDialog();
WaveFileReader reader = new WaveFileReader(file.FileName);

byte[] data = new byte[reader.Length];
reader.Read(data, 0, data.Length);

samepleRate = reader.WaveFormat.SampleRate;
bitDepth = reader.WaveFormat.BitsPerSample;
channels = reader.WaveFormat.Channels;

Console.WriteLine("audio has " + channels + " channels, a sample rate of " + samepleRate + " and bitdepth of " + bitDepth + ".");

// *** NAudio "thinks" in floats
float[] floats = new float[data.Length / sizeof(float)]
Buffer.BlockCopy(data, 0, floats, 0, data.Length);

int size = 4096;
// *** You don't have to fill the FFT buffer to get valid results.  More noisy & smaller "magnitudes", but better freq. res.
int inputSamples = samepleRate / 100; // 10ms... adjust as needed
int offset = samepleRate * 10 * channels;
int y = 0;
Complex[] complexData = new Complex[size];
// *** get a "scaling" curve to make both ends of sample region 0 but still allow full amplitude in the middle of the region.
float[] window = CalcWindowFunction(inputSamples);
for (int i = 0; i < inputSamples; i++)
{
    // *** "floats" is stored as LRLRLR interleaved data for stereo audio
    complexData[y] = new Complex(floats[i * channels + offset] * window[i], 0);
    y++;
}
// make sure the back portion of the buffer is set to all 0's
while (y < size)
{
    complexData[y] = new Complex(0, 0);
    y++;
}


// *** Consider using a DCT here instead...  It returns less "noisy" results
FourierTransform.FFT(complexData, FourierTransform.Direction.Forward);


double[] arr = new double[complexData.Length];
//print out sample of conversion
for (int i = 0; i < complexData.Length; i++)
{
    // *** I assume we don't care about phase???
    arr[i] = complexData[i].Magnitude;
}

Console.Write("complete, ");


return arr;

Once you get the results, and assuming a 44100 Hz sample rate and size = 4096, elements 2 - 4 should be the values you are looking for. There's a way to convert them to dB, but I don't remember it offhand.

Good luck!

ioctlLR
  • 1,217
  • 6
  • 13
  • Thank you very much. I cant make you understand how much I mean that. This has been troubling me for some time. – Pete B Nov 21 '14 at 23:34
  • What are you using for your calcWindowsFunction(), Is there one availble within a package or will I have to look into implementing this myself. Thanks – Pete B Nov 21 '14 at 23:54
  • You can just return a float array (with "size" elements) with all 1's. There are better windows, but you'll need to figure out the right one from http://en.wikipedia.org/wiki/Window_function. Generally, just implement either the Hamming or Blackman-Harris windows and you should be in great shape. There's plenty of example code out there. – ioctlLR Nov 23 '14 at 02:55
  • When I change the part of the audio I want to transform, say from 10 seconds in like it is set in your example, to say 20, or lower to 2, I get array full of NaN (not a number). 0 works fine, Do you know why this is? – Pete B Nov 24 '14 at 00:42
  • Nope. "offset" is just what it sounds like: an offset into the array. Changing it shouldn't be a big deal as long as you stay inside the bounds of the array. – ioctlLR Nov 24 '14 at 01:34
  • the outputted numbers are numbers like 2.27267719403015E+35 which are incredibly big numbers, Are these going to be correct? – Pete B Nov 24 '14 at 16:56