4

I'm new to audio analysis, but need to perform a (seemingly) simple task. I have a byte array containing a 16 bit recording (single channel) and a sample rate of 44100. How do I perform a quick analysis to get the volume at any given moment? I need to calculate a threshold, so a function to return true if it's above a certain amplitude (volume) and false if not. I thought I could iterate through the byte array and check its value, with 255 being the loudest, but this doesn't seem to work as even when I don't record anything, background noise gets in and some of the array is filled with 255. Any suggestions would be great. Thanks

Brap
  • 2,647
  • 2
  • 19
  • 15
  • If it's 16-bit data, you need to check byte pairs. – icyrock.com Dec 06 '10 at 03:24
  • I tried converting it to an array of shorts. I started getting negative values and values greater than 255. Is this normal? If so, what do negative values represent in a single channel and what would the maximum volume value be? thanks – Brap Dec 06 '10 at 03:38

3 Answers3

5

As you have 16-bit data, you should expect the signal to vary between -32768 and +32767. To calculate the volume you can take intervals of say 1000 samples, and calculate their RMS value. Sum the squared sample values divide by 1000 and take the square root. check this number against you threshold.

Han
  • 2,017
  • 17
  • 23
4

Typically one measures the energy of waves using root mean square.

If you want to be more perceptually accurate you can take the time-domain signal through a discrete fourier transform to a frequency-domain signal, and integrate over the magnitudes with some weighting function (since low-frequency waves are perceptually louder than high-frequency waves at the same energy).

But I don't know audio stuff either so I'm just making stuff up. ☺

ephemient
  • 198,619
  • 38
  • 280
  • 391
  • thanks. RMS is interesting. I don't need to be accurate, just a rough approximation. I essentially only need to call an event if the user is talking above a certain threshold. Therefore, the fastest method is all I need. – Brap Dec 06 '10 at 03:46
0

I might try applying a standard-deviation sliding-window. OTOH, I would not have assumed that 255 = loudest. It may be, but I'd want to know what encoding is being used. If any compression is present, then I doubt 255 is "loudest."

Brent Arias
  • 29,277
  • 40
  • 133
  • 234
  • I'm using the Microphone class from MSDN - http://msdn.microsoft.com/en-us/library/microsoft.xna.framework.audio.microphone_members.aspx Says it requires PCM Wave Data. – Brap Dec 06 '10 at 03:47