Getting soundfile.LibsndfileError: Error opening 'speech.wav': Format not recognized when giving 2D numpy array to soundfile

Question

Tried generating audio from tensors generated from NVIDIA TTS nemo model before running into the error:

Here is the code for it:

import soundfile as sf

from nemo.collections.tts.models import FastPitchModel
from nemo.collections.tts.models import HifiGanModel

spec_generator = FastPitchModel.from_pretrained("tts_en_fastpitch")
vocoder = HifiGanModel.from_pretrained(model_name="tts_hifigan")

text = "Just keep being true to yourself, if you're passionate about something go for it. Don't sacrifice anything, just have fun."
parsed = spec_generator.parse(text)
spectrogram = spec_generator.generate_spectrogram(tokens=parsed)
audio = vocoder.convert_spectrogram_to_audio(spec=spectrogram)
audio = audio.to('cpu').detach().numpy()

sf.write("speech.wav", audio, 22050)

Expected to get an audio file speech.wav

score 3 · Accepted Answer · answered Jan 06 '23 at 11:54

Looking at your example I see that your audio shape is (1, 173056).

Based on https://github.com/bastibe/python-soundfile/issues/309 I have converted the audio to 1D array of size 173056 and worked fine.

Used code:

>>> import numpy as np
>>> sf.write("speech.wav", np.ravel(audio), sample_rate)

Regards,

score 0 · Answer 2 · answered Mar 26 '23 at 22:08

0

In case you really need the audio in stereo (like I did), transpose the array. Per soundfile documentation, the expected shape is (samples x channels).

answered Mar 26 '23 at 22:08

wleong

1

Ayodeji Babalola · Answer 3 · 2023-08-07T14:41:41.783

0

x, _ = lib.load(path, sr=None, mono=True)
sf.write('new-file.wav', x, 4000) # for a file we want to write with 4k sample rate

check that mono == True so you load a stereo file.

The above code solves the problem. You need to check that the channels loaded correspond to the one you are trying to write.

edited Aug 07 '23 at 14:41

answered Aug 07 '23 at 14:40

Ayodeji Babalola

1
2

Getting soundfile.LibsndfileError: Error opening 'speech.wav': Format not recognized when giving 2D numpy array to soundfile

3 Answers3