3

I'm using FMOD library to extract PCM from an MP3. I get the whole 2 channel - 16 bit thing, and I also get that a sample rate of 44100hz is 44,100 samples of "sound" in 1 second. What I don't get is, what exactly does the 16 bit value represent. I know how to plot coordinates on an xy axis, but what am I plotting? The y axis represents time, the x axis represents what? Sound level? Is that the same as amplitude? How do I determine the different sounds that compose this value. I mean, how do I get a spectrum from a 16 bit number.

This may be a separate question, but it's actually what I really need answered: How do I get the amplitude at every 25 milliseconds? Do I take 44,100 values, divide by 40 (40 * 0.025 seconds = 1 sec) ? That gives 1102.5 samples; so would I feed 1102 values into a blackbox that gives me the amplitude for that moment in time?

Edited original post to add code I plan to test soon: (note, I changed the frame rate from 25 ms to 40 ms)

// 44100 / 25 frames = 1764 samples per frame -> 1764 * 2 channels * 2 bytes [16 bit sample] = 7056 bytes
private const int CHUNKSIZE = 7056;
uint    bytesread = 0;
var squares = new double[CHUNKSIZE / 4];
const double scale = 1.0d / 32768.0d;

do
{
    result = sound.readData(data, CHUNKSIZE, ref read);

    Marshal.Copy(data, buffer, 0, CHUNKSIZE);

    //PCM samples are 16 bit little endian
    Array.Reverse(buffer);

    for (var i = 0; i < buffer.Length; i += 4)
    {
        var avg = scale * (Math.Abs((double)BitConverter.ToInt16(buffer, i)) + Math.Abs((double)BitConverter.ToInt16(buffer, i + 2))) / 2.0d;
        squares[i >> 2] = avg * avg;
    }

    var rmsAmplitude = ((int)(Math.Floor(Math.Sqrt(squares.Average()) * 32768.0d))).ToString("X2");

    fs.Write(buffer, 0, (int) read);

    bytesread += read;

    statusBar.Text = "writing " + bytesread + " bytes of " + length + " to output.raw";
} while (result == FMOD.RESULT.OK && read == CHUNKSIZE);

After loading mp3, seems my rmsAmplitude is in the range 3C00 to 4900. Have I done something wrong? I was expecting a wider spread.

Paul Rivera
  • 525
  • 1
  • 8
  • 20

3 Answers3

2

Yes, a sample represents amplitude (at that point in time).

To get a spectrum, you typically convert it from the time domain to the frequency domain.

Last Q: Multiple approaches are used - You may want the RMS.

slashmais
  • 7,069
  • 9
  • 54
  • 80
justin
  • 104,054
  • 14
  • 179
  • 226
  • Oh, I guess that brings up another question, before calculating the RMS, I assume that if I have 2 channels that the 16 bit values for each channel are consecutive in the buffer, and that I should take the average of the 2, correct? – Paul Rivera Apr 30 '12 at 17:37
  • @PaulRivera yup. try `(|Ln| + |Rn|) / 2` for RMS-mono – justin Apr 30 '12 at 17:51
  • So, just to be sure, -32768 <= n <= 32767. – Paul Rivera Apr 30 '12 at 19:21
  • on second thought, you could just combine samples from all channels into the rms in this case. – justin May 01 '12 at 04:44
0

Generally, the x axis is the time value and y axis is the amplitude. To get the frequency, you need to take the Fourier transform of the data (most likely using the Fast Fourier Transform [fft] algorithm).

To use one of the simplest "sounds", let's assume you have a single frequency noise with frequency f. This is represented (in the amplitude/time domain) as y = sin(2 * pi * x / f). If you convert that into the frequency domain, you just end up with Frequency = f.

Foon
  • 6,148
  • 11
  • 40
  • 42
0

Each sample represents the voltage of the analog signal at a given time.