12

I have a question about the difference between the load function of librosa and the read function of scipy.io.wavfile.

from scipy.io import wavfile
import librosa

fs, data = wavfile.read(name)
data, fs = librosa.load(name)

The imported voice file is the same file. If you run the code above, the values ​​of the data come out of the two functions differently. I want to know why the value of the data is different.

MaxPowers
  • 5,235
  • 2
  • 44
  • 69
이응재
  • 123
  • 1
  • 6

4 Answers4

11

From the docstring of librosa.core.load:

Load an audio file as a floating point time series.

Audio will be automatically resampled to the given rate (default sr=22050).

To preserve the native sampling rate of the file, use sr=None.

scipy.io.wavfile.read does not automatically resample the data, and the samples are not converted to floating point if they are integers in the file.

Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214
5

It's worth also mentioning that librosa.load() normalizes the data (so that all the data points are between 1 and -1), whereas wavfile.read() does not.

A.Davies
  • 81
  • 1
  • 6
5

The data is different because scipy does not normalize the input signal.

Here is a snippet showing how to change scipy output to match librosa's:

nbits = 16

l_wave, rate = librosa.core.load(path, sr=None)
rate, s_wave = scipy.io.wavfile.read(path)

s_wave /= 2 ** (nbits - 1)

all(s_wave == l_wave)
# True
vdi
  • 743
  • 10
  • 20
  • Got an error (`s_wave_old /= 2 ** (nbits - 1) numpy.core._exceptions.UFuncTypeError: Cannot cast ufunc 'divide' output from dtype('float64') to dtype('int16') with casting rule 'same_kind')` running this unchanged, so had to convert the type `s_wave = np.array(s_wave_orig, dtype=np.float32)` – Frankie Drake Sep 02 '22 at 12:42
2

librosa.core.load has support for 24 bit audio files and 96kHz sample rates. Because of this, converting to float and default resampling, it can be considerably slower than scipy.io.wavfile.read in many cases.