Is it possible to do sound reconstruction by just using a picture of the STFT spectrogram?

Question

I have some STFT images that look like these: STFT1 and STFT2.

I did some image processing on these images, i.e. adding a filter.

I'm wondering, is it possible to extract the STFT information from these modified images?

In other words, how is the STFT information encoded in the RGB values?

For example, I used librosa.display.specshow(Xdb, sr=sr) to plot the spectrogram. Then, I used plt.savefig() to save the plotted graph on my local machine. Next, I added a filter to the image to make it looks blurry. The problem here is that the spectrogram has been modified so I don't know what that Xdb is anymore. Is there any way to inverse the librosa.display.specshow function call and restore Xdb from the spectrogram image?

I think this is possible, in theory. You will need to have the image saved in a completely lossless format. — Z4-tier, Apr 30 '20 at 06:14
Does this answer your question? [Can I convert spectrograms generated with librosa back to audio?](https://stackoverflow.com/questions/61132574/can-i-convert-spectrograms-generated-with-librosa-back-to-audio) — Lukasz Tracewski, Apr 30 '20 at 06:24
If you had a magnitude and phase pair of spectrogram you probably could, though there would be some loss. If you were using a image format supporting an alpha channel you could encode _magnitude_ in the red and green channels and _phase_ in the blue and alpha channels giving 16-bit accuracy for each. This would be something bespoke rather than something you could use immediately with scipy. You would also have to have some metadata describing how many pixels each frequency bin represented and how many pixels represented each short-time window. — fdcpp, Apr 30 '20 at 10:11
@fdcpp Ordinary spectrogram works just fine, the magnitude is encoded in the pixel intensity. Yes, you don't have the phase - estimation of it is the job of the algorithm that reconstructs the signal. Linked response provides some details and extra materials. — Lukasz Tracewski, Apr 30 '20 at 12:16
@LukaszTracewski Thanks for the reply, and yes, this is similar to what I am looking for. However, I'm stuck on how to read from the image. For example, I have my modified image saved as `'modified_STFT.png'`, and I name the STFT spectrogram generated from the original audio file as `'original_STFT.png'`. When I use `cv2.imread` to read both images, the content (array of data) looks identical. How do I interpret this data, and how is this data related to the STFT? — sensationti, May 02 '20 at 06:41
It's a different question then. If you have modified the image and yet the array looks identical, then it means you have not modified it or something else went wrong. Hard to tell without the code. — Lukasz Tracewski, May 02 '20 at 06:45
@LukaszTracewski Thanks for pointing out. I checked the actual content of both files, it appeared that they are not identical. What I got from reading the `.png` is a 3-d numpy array with the corresponding RGB values. So, how are these RGB values correspond to the STFT data? should I open a new thread for this question? — sensationti, May 02 '20 at 07:19
You need to apply inverse transform to the color map you used when plotting. I'd modify the question accordingly. — Lukasz Tracewski, May 02 '20 at 09:48

Is it possible to do sound reconstruction by just using a picture of the STFT spectrogram?

0 Answers0