Reconstructing audio from spectrogram

Question

I have followed (https://www.mathworks.com/help/signal/ref/stftmag2sig.html) to construct a spectrogram from the sound. I have exported and saved the spectrogram as a png image. What I want now is, import the image in matlab or any alternate platform and construct the audio from it. I have gone through many posts and reading materials but they do not deal with the generation of audio from images. Mostly they rely on the sound information for the reconstruction where the spectrogram just appears for visualization purposes and nothing more. Attaching untitled.png image for reference.

The link you posted has an example called "Reconstruct Audio Signal from STFT Magnitude," is this not what you want? In any case if I were you I would start by trying to reconstruct the signal given a matrix of amplitudes in the frequency/time space. Then figure out how to turn an image into such a matrix if that's what you need to do. — Sam Szotkowski, Apr 04 '21 at 03:50
I want to reconstruct an audio signal from the spectrogram. The link provides information of how we can reconstruct an audio signal from STFT magnitude. Even for that they highly rely on the signal itself for the magnitude portion. I want to see if we can extract the same information from the spectrogram image in the first place. — Abdul Jamali, Apr 04 '21 at 06:55
I’d recommend looking at other questions first. Though the language may not be the same in some the principles remain the same. https://stackoverflow.com/questions/47983897/python-reconstruct-audio-file-from-stft , https://stackoverflow.com/questions/64345872/possible-to-reconstruct-audio-only-with-spectrogram-image , https://stackoverflow.com/questions/58409556/is-there-a-way-to-invert-a-spectrogram-back-to-signal/58409693 — fdcpp, Apr 04 '21 at 07:43
I have gone through these already, they mostly rely on the signal magnitude rather than the spectrogram as an image. What I have understood so far is that I need to extract the magnitude information of the signal through spectrogram image first and this is something I am not familiar with how to achieve. — Abdul Jamali, Apr 04 '21 at 08:40
Well as far as extracting values from an image, you could try converting to greyscale and loading it into something like a numpy array where each element is the brightness of a pixel. That's not such a good solution because it depends heavily on how your color map is on the o.g. image. You're way better off working with actual data than images, but if you must use images try to use greyscale ones. — Sam Szotkowski, Apr 04 '21 at 10:39
As far as inversing STFT / reconstructing signal, this is only possible if you already know the window function, correct? — Sam Szotkowski, Apr 04 '21 at 10:42
Yes, there are few things that I know since I already used some functions to convert audio to spectrogram. I have to generate the audio from spectrogram images due to a pipeline process we need to follow. — Abdul Jamali, Apr 04 '21 at 16:39

score 0 · Answer 1 · answered Apr 04 '21 at 10:31

After the doc you referenced:

s = imread('im.png') // see remarks below
x = stftmag2sig(s,nfft) // x is your audio

s is your image. The OP produces these spectrograms, so he controls the output. Based on that:

Avoid lossy image formats and make sure there's no rescaling / interpolation happening. Your pixel should contain amplitude from the given time window (nfft) and frequency bin.
Either produce images with only spectrogram (no axes) or know exact coordinates of your spectrogram
Do not use colour in spectrograms. It looks nice, but introduces completely unnecessary ambiguity as for how to map 3-tuple colour to amplitude.

Reconstructing audio from spectrogram

1 Answers1