1

I have a mono wav file for a 'glass breaking' sound. When I graphically display it's levels in python using librosa library, it shows very large range of amplitudes, between +/ 20000 instead of +/- 1. When I open same wav file with Audacity, the levels are between +/- 1.

My question is what generates this difference in displayed amplitude levels and how can I correct it in Python? MinMax scaling will distort the sound and I want to avoid it if possible.

The code is:

from scipy.io import wavfile
fs1, glass_break_data = wavfile.read('test_break_glass_normalized.wav')

%matplotlib inline
import matplotlib.pyplot as plt
import librosa.display

sr=44100
x = glass_break_data.astype('float')

plt.figure(figsize=(14, 5))
librosa.display.waveplot(x, sr=sr)

These are the images from the notebook and Audacity:

enter image description here

enter image description here

crbl
  • 379
  • 2
  • 13

2 Answers2

3

WAV usually uses integer values to represent individual samples, not floats. So what you see in the librosa plot is accurate for a 16 bit/sample audio file.

Programs like VLC show the format, including bit depth per sample in their info dialog, so you can easily check. Another way to check the format might be using soxi or ffmpeg.

Audacity normalizes everything to floats in the range of -1 to 1—it does not show you the original format.

The same is true for librosa.load()—it also normalizes to [-1,1]. wavfile.read() on the other hand, does not normalize. For more info on ways to read WAV audio, please see for example this answer.

Hendrik
  • 5,085
  • 24
  • 56
1

If you use librosa.load instead of wavfile.read it will normalize the range to -1, 1

glass_break_data, fs1 = librosa.load('test_break_glass_normalized.wav')
Jon Nordby
  • 5,494
  • 1
  • 21
  • 50