1

I am working with wav files analysis using the librosa library in python. I used librosa.load() to load the audio file. Apparently this function loads the wav file into a numpy array with normalised amplitude values in the range -1 to 1. But I need to get the actual amplitude values for processing. How can I find that?

Thanks in advance!

Hendrik
  • 5,085
  • 24
  • 56
Archit Sahu
  • 13
  • 1
  • 4

2 Answers2

3

You observed correctly that librosa always normalizes the samples to mono [-1:1] (and also 22050 Hz). That said, it's digital audio, so could multiply with whatever you want to get a different scale. If you insist, that your samples are on a scale of -2^15 to 2^15, simply multiply with 2^15. It pretty much means the same.

You won't gain anything, except dragging a peculiarity of the encoding audio format into your data.

That said, if that's what you want, you could use PySoundFile like this:

import soundfile as sf

y, sr = sf.read('existing_file.wav', dtype='int16')

The parameter dtype='int16' tells the library to assume a signed 16bit format per sample.

Hendrik
  • 5,085
  • 24
  • 56
  • Thanks Hendrik. What if I am using scipy.io.wavefile.read()? what does that return and can I use those values in place of real magnitude values to do further calculations? – Archit Sahu Jul 11 '21 at 19:56
  • According to the [docs](https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.wavfile.read.html), `scipy.io.wavefile.read()` returns a datatype that depends on the WAV used, but probably an `int16` (most commonly used). Not sure what you mean with *real* magnitude values? Physical? No idea *what* further calculations you want to do, but sure, you can use anything to calculate everything. Unless you specify exactly what you want to do later on, you cannot be helped. Be more specific! I suggest creating a new question, though. – Hendrik Jul 12 '21 at 18:43
2

You can't. As Hendrik mentioned, the signal is digital and the amplitude in the WAV file won't tell you anything about the actual sound wave amplitude / sound power. That's completely lost the moment it was digitalised to WAV.

That being said, you can compute e.g. loudness, a relative perception of the sound power. If you are dealing with human auditory system, one of the recommended approaches is to:

  1. Use to the Bark scale (Bark scale better reflects how we hear).
  2. Compute energy in each bin.
  3. (Optional) Normalise by the overall sum.

If you don't want to compute it yourself, check out e.g. YAAFE.

Lukasz Tracewski
  • 10,794
  • 3
  • 34
  • 53
  • Thanks Lukasz. In the project I am working on I need to find the energy function e(n) as the sum of magnitudes(s(n)) for 10ms interval. So if I am using the values I get from librosa.read() in place of s(n) directly will that work for the energy function or I have to use some different approach? – Archit Sahu Jul 12 '21 at 07:56
  • Yes, `librosa.read()` gives you the samples. For signal with e.g. 16 khz sampling rate that will be 10ms * 16000 Hz = 0.01s * 16000 Hz = 160 samples. – Lukasz Tracewski Jul 12 '21 at 08:11