Difference between load of librosa and read of scipy.io.wavfile

Question

I have a question about the difference between the load function of librosa and the read function of scipy.io.wavfile.

from scipy.io import wavfile
import librosa

fs, data = wavfile.read(name)
data, fs = librosa.load(name)

The imported voice file is the same file. If you run the code above, the values of the data come out of the two functions differently. I want to know why the value of the data is different.

2

In what do they differ? – MaxPowers Apr 27 '18 at 12:46

Warren Weckesser · Accepted Answer · 2018-04-27T13:30:13.430

11

From the docstring of librosa.core.load:

Load an audio file as a floating point time series.

Audio will be automatically resampled to the given rate (default sr=22050).

To preserve the native sampling rate of the file, use sr=None.

scipy.io.wavfile.read does not automatically resample the data, and the samples are not converted to floating point if they are integers in the file.

edited Apr 27 '18 at 13:30

answered Apr 27 '18 at 13:24

Warren Weckesser

110,654
19
194
214

3

Implicit resampling is imao just one of several drawbacks of `librosa`. – MaxPowers Apr 28 '18 at 06:41
2

Are they different in terms of how fast they can load a file ? – Filipe Pinto Jan 15 '19 at 19:44
1

And I also discovered that librosa is much slower than scipy when reading the audios files. Maybe it is due to re-sampling? – Raven Cheuk Jun 29 '19 at 09:35

score 5 · Answer 2 · edited Oct 10 '20 at 15:11

5

It's worth also mentioning that librosa.load() normalizes the data (so that all the data points are between 1 and -1), whereas wavfile.read() does not.

edited Oct 10 '20 at 15:11

electriccello

67
7

answered Nov 21 '19 at 14:17

A.Davies

81
1
6

1

The data points are actually between 1 and -1, but why scaling and does it make FFT results different? – Chris Wong Jul 08 '20 at 07:42

score 5 · Answer 3 · answered Mar 17 '21 at 15:13

5

The data is different because scipy does not normalize the input signal.

Here is a snippet showing how to change scipy output to match librosa's:

nbits = 16

l_wave, rate = librosa.core.load(path, sr=None)
rate, s_wave = scipy.io.wavfile.read(path)

s_wave /= 2 ** (nbits - 1)

all(s_wave == l_wave)
# True

answered Mar 17 '21 at 15:13

vdi

743
10
20

Got an error (`s_wave_old /= 2 ** (nbits - 1) numpy.core._exceptions.UFuncTypeError: Cannot cast ufunc 'divide' output from dtype('float64') to dtype('int16') with casting rule 'same_kind')` running this unchanged, so had to convert the type `s_wave = np.array(s_wave_orig, dtype=np.float32)` – Frankie Drake Sep 02 '22 at 12:42

score 2 · Answer 4 · answered Apr 14 '19 at 15:12

2

librosa.core.load has support for 24 bit audio files and 96kHz sample rates. Because of this, converting to float and default resampling, it can be considerably slower than scipy.io.wavfile.read in many cases.

answered Apr 14 '19 at 15:12

electriccello

67
7

1

For more explanation you can refer to this link: https://github.com/SeanNaren/deepspeech.pytorch/issues/40 – Hamed Baziyad Dec 02 '19 at 09:02

Difference between load of librosa and read of scipy.io.wavfile

4 Answers4

Linked