How can I remove distortion introduced by librosa griffin lim?

Question

I'm doing:

import librosa


D = librosa.stft(samples, n_fft=nperseg, 
                 hop_length=overlap, win_length=nperseg,
                 window=scipy.signal.windows.hamming)

spect, _ = librosa.magphase(D)

audio_signal = librosa.griffinlim(spect, n_iter=1024, 
                                  win_length=nperseg, hop_length=overlap, 
                                  window=signal.windows.hamming)
print(audio_signal, audio_signal.shape)
sf.write('test.wav', audio_signal, sample_rate)

And it is introducing noticeable distortion in the reconstructed audio signal. What can I do to improve that?

Do you have the original phase information? In this case it can be better to use that one (even if youve modified the magnitudes) that reconstructing with GriffinLim — Jon Nordby, Apr 01 '20 at 20:08
I am not very familiar with the specifics of this problem but here are some suggestions: try out [smoothing](https://stackoverflow.com/questions/20618804/how-to-smooth-a-curve-in-the-right-way), use smaller `window_length` and maybe a bigger `hop_length`, I would say 50% is good ... Consider advanced speech enhancement techniques. You can also post about this on the GitHub page of Librosa, the developers might help as they are probably more familiar with this type of problems. — SuperKogito, Apr 06 '20 at 14:01
@Shamoon does this provide any clues to your problem? https://ieeexplore.ieee.org/document/8521304 — sashimi, Nov 19 '20 at 15:31
It's important to notice that there's a very high chance that the reconstruction quality will be suboptimal if you don't have the correct phase information. The Griffin-Lim algorithm can only provide an _estimate_ for the phase. Therefore, it is expected that the resulting audio signal contains artifacts compared to the original signal or the signal that you'd obtain _with_ the correct phase information. — applesoup, Jul 31 '21 at 18:34

score 1 · Answer 1 · answered Sep 13 '21 at 18:57

You need to use a window function that is centered so that the windowed signal is zero-phase, i.e. it is perfectly symmetrical around the middle of the window. In this case, you can use the hann window, which is a raised cosine window with non-zero endpoints.

D = librosa.stft(samples, n_fft=nperseg, 
                 hop_length=overlap, win_length=nperseg,
                 window=scipy.signal.windows.hann)

spect, _ = librosa.magphase(D)

audio_signal = librosa.griffinlim(spect, n_iter=1024, 
                                  win_length=nperseg, hop_length=overlap, 
                                  window=signal.windows.hann)
print(audio_signal, audio_signal.shape)
sf.write('test.wav', audio_signal, sample_rate)

score 0 · Answer 2 · answered Nov 15 '21 at 12:33

0

You should use neural network based vocoder like WaveNet for reconstruction

answered Nov 15 '21 at 12:33

zübeyir genç

1

How can I remove distortion introduced by librosa griffin lim?

2 Answers2