38

I'm doing:

import librosa


D = librosa.stft(samples, n_fft=nperseg, 
                 hop_length=overlap, win_length=nperseg,
                 window=scipy.signal.windows.hamming)

spect, _ = librosa.magphase(D)

audio_signal = librosa.griffinlim(spect, n_iter=1024, 
                                  win_length=nperseg, hop_length=overlap, 
                                  window=signal.windows.hamming)
print(audio_signal, audio_signal.shape)
sf.write('test.wav', audio_signal, sample_rate)

And it is introducing noticeable distortion in the reconstructed audio signal. What can I do to improve that?

SuperKogito
  • 2,998
  • 3
  • 16
  • 37
Shamoon
  • 41,293
  • 91
  • 306
  • 570
  • 2
    Do you have the original phase information? In this case it can be better to use that one (even if youve modified the magnitudes) that reconstructing with GriffinLim – Jon Nordby Apr 01 '20 at 20:08
  • 2
    I only have the magnitudes – Shamoon Apr 01 '20 at 20:12
  • I am not very familiar with the specifics of this problem but here are some suggestions: try out [smoothing](https://stackoverflow.com/questions/20618804/how-to-smooth-a-curve-in-the-right-way), use smaller `window_length` and maybe a bigger `hop_length`, I would say 50% is good ... Consider advanced speech enhancement techniques. You can also post about this on the GitHub page of Librosa, the developers might help as they are probably more familiar with this type of problems. – SuperKogito Apr 06 '20 at 14:01
  • 2
    https://timsainburg.com/noise-reduction-python.html – Joshua Varghese Apr 07 '20 at 18:50
  • @Shamoon does this provide any clues to your problem? https://ieeexplore.ieee.org/document/8521304 – sashimi Nov 19 '20 at 15:31
  • It's important to notice that there's a very high chance that the reconstruction quality will be suboptimal if you don't have the correct phase information. The Griffin-Lim algorithm can only provide an _estimate_ for the phase. Therefore, it is expected that the resulting audio signal contains artifacts compared to the original signal or the signal that you'd obtain _with_ the correct phase information. – applesoup Jul 31 '21 at 18:34

2 Answers2

1

You need to use a window function that is centered so that the windowed signal is zero-phase, i.e. it is perfectly symmetrical around the middle of the window. In this case, you can use the hann window, which is a raised cosine window with non-zero endpoints.

D = librosa.stft(samples, n_fft=nperseg, 
                 hop_length=overlap, win_length=nperseg,
                 window=scipy.signal.windows.hann)

spect, _ = librosa.magphase(D)

audio_signal = librosa.griffinlim(spect, n_iter=1024, 
                                  win_length=nperseg, hop_length=overlap, 
                                  window=signal.windows.hann)
print(audio_signal, audio_signal.shape)
sf.write('test.wav', audio_signal, sample_rate)
0

You should use neural network based vocoder like WaveNet for reconstruction