Volume from byte array

Question

I'm new to audio analysis, but need to perform a (seemingly) simple task. I have a byte array containing a 16 bit recording (single channel) and a sample rate of 44100. How do I perform a quick analysis to get the volume at any given moment? I need to calculate a threshold, so a function to return true if it's above a certain amplitude (volume) and false if not. I thought I could iterate through the byte array and check its value, with 255 being the loudest, but this doesn't seem to work as even when I don't record anything, background noise gets in and some of the array is filled with 255. Any suggestions would be great. Thanks

I tried converting it to an array of shorts. I started getting negative values and values greater than 255. Is this normal? If so, what do negative values represent in a single channel and what would the maximum volume value be? thanks — Brap, Dec 06 '10 at 03:38

score 5 · Accepted Answer · answered Dec 06 '10 at 08:22

As you have 16-bit data, you should expect the signal to vary between -32768 and +32767. To calculate the volume you can take intervals of say 1000 samples, and calculate their RMS value. Sum the squared sample values divide by 1000 and take the square root. check this number against you threshold.

score 4 · Answer 2 · answered Dec 06 '10 at 03:37

4

Typically one measures the energy of waves using root mean square.

If you want to be more perceptually accurate you can take the time-domain signal through a discrete fourier transform to a frequency-domain signal, and integrate over the magnitudes with some weighting function (since low-frequency waves are perceptually louder than high-frequency waves at the same energy).

But I don't know audio stuff either so I'm just making stuff up. ☺

answered Dec 06 '10 at 03:37

ephemient

198,619
38
280
391

thanks. RMS is interesting. I don't need to be accurate, just a rough approximation. I essentially only need to call an event if the user is talking above a certain threshold. Therefore, the fastest method is all I need. – Brap Dec 06 '10 at 03:46

score 0 · Answer 3 · answered Dec 06 '10 at 03:38

0

I might try applying a standard-deviation sliding-window. OTOH, I would not have assumed that 255 = loudest. It may be, but I'd want to know what encoding is being used. If any compression is present, then I doubt 255 is "loudest."

answered Dec 06 '10 at 03:38

Brent Arias

29,277
40
133
234

I'm using the Microphone class from MSDN - http://msdn.microsoft.com/en-us/library/microsoft.xna.framework.audio.microphone_members.aspx Says it requires PCM Wave Data. – Brap Dec 06 '10 at 03:47

Volume from byte array

3 Answers3