1

I'm planning to make an universal application that analyses audio samples. When I say 'universal' I mean that any technology (Javascript, C, Java, etc) can use it. Basically I made an application on iOS, using Apple's AVFoundation, that receives on real time the microphone samples at a lenght of 512 (bufferSize = 512). At Python I made the same thing, using PyAudio, but unfortunately I received very different values...

Look the samples:

Samples of bufferSize = 512 on iOS:

[0.0166742969, 0.0181432627, 0.0184620395, 0.0182254426, 0.0181945376, 0.0185530782, 0.0192517322, 0.0199078992, 0.0204724055, 0.0212812237, 0.022370765, 0.0230008475, 0.0225516111, 0.0213304944, 0.0200473778, 0.019841563, 0.0206818394, 0.0211550407, 0.0207783803, 0.020227218 ....

Samples of bufferSize = 512 on Python:

[ -52.  -32.  -11.   10.   24.   31.   37.   38.   33.   25.   10.   -4.
  -18.  -26.  -29.  -39. ....

For more:

https://pastebin.com/jrM2VWXR

The Python code:

https://gist.github.com/denisb411/7c6f601175e8bb9f735d8aa43a0db340

On both cases I used the same computer.

How do I find a way to 'convert'(don't know if this is the proper word) them to the same scale?

If I wasn't clear at the question please notify me.

denisb411
  • 581
  • 1
  • 7
  • 25

1 Answers1

0

Audio samples are typically quantized on 16 or 24 bits. But there are different conventions about the range of values these samples can take:

  • if you would quantize on 8 bits, samples would usually be stored as unsigned bytes, ranging from 0 to 255
  • if you would quantize on 16 bits, samples would usually be stored as 2's-complement signed integers, ranging from -32768 to 32767
  • if you would quantize on 24 bits, samples would usually be stored as unsigned integers
  • etc.

Basically, when you decide to store samples, you have two parameters:

  • signed or unsigned
  • int or float

Each has its advantages and drawbacks. For instance, storing in a float in the range [-1, 1] has the advantage that multiplying two samples will always be in the same range of [-1, 1]…

So, to answer your question, you just need to change the format with which you open your PyAudio stream. Currently, you use format=pyaudio.paInt16. Try to change it pyaudio.paFloat32 and you should get the same data as with your iOS implementation.

filaton
  • 2,257
  • 17
  • 27
  • Thanks a lot. This solved my question. Just to complement, I had to use 'f' instead of 'h' at the wave.struct.unpack. – denisb411 Jun 01 '17 at 16:35
  • If possible can you clear me one more doubt? To convert these float32 value to db I just 10*log10(sample) each value? Is this valid? I want to fft these samples and a logarithmic scale will favor me a lot. – denisb411 Jun 01 '17 at 17:25
  • That will work if your samples are on a unsigned float scale. Otherwise, you'll have to scale your samples. The maximum sample value should give you 0dB and the minimum sample value should give -inf dB. Just find *a* and *b* so that "a * min_sample_value + b = 0" and "a * max_sample_value + b = 1" – filaton Jun 05 '17 at 13:30