How can I convert spectrogram data to a tensor (or multidimensional numpy array)?

Question

I am using keras and have:

        corrupted_samples, corrupted_sample_rate = sf.read(
            self.corrupted_audio_file_paths[index])

        frequencies, times, spectrogram = scipy.signal.spectrogram(
            corrupted_samples, corrupted_sample_rate)

As per the docs, this gives:

f (ndarray) - Array of sample frequencies.
t (ndarray) - Array of segment times.
Sxx (ndarray) - Spectrogram of x. By default, the last axis of Sxx corresponds to the segment times.

I assume all of the times will line up, so I don't care about the value of the time (I don't think). The same is true of frequencies. So what I actually need is the values at each time for each frequency, which is given by Sxx (or spectrogram) in my code. I'm unsure how to actually do that. It seems simple though.

`Sxx` is a numpy array. It looks like your question is [how to convert a numpy array to keras tensor](https://stackoverflow.com/questions/52816938/how-to-convert-numpy-array-to-keras-tensor). — Warren Weckesser, Feb 17 '20 at 15:56

score 2 · Accepted Answer · answered Feb 17 '20 at 03:21

Based on https://towardsdatascience.com/speech-recognition-analysis-f03ff9ce78e9, the author stated that the spectrogram is a spectro-temporal representation of the sound and show some of the steps of converting wav file to spectogram.

One of the example could be as below:

## Check the sampling rate of the WAV file.
audio_file = './siren_mfcc_demo.wav'


import wave
with wave.open(audio_file, "rb") as wave_file:
    sr = wave_file.getframerate()
print(sr)

audio_binary = tf.read_file(audio_file)

# tf.contrib.ffmpeg not supported on Windows, refer to issue
# https://github.com/tensorflow/tensorflow/issues/8271
waveform = tf.contrib.ffmpeg.decode_audio(audio_binary, file_format='wav', samples_per_second=sr, channel_count=1)
print(waveform.numpy().shape)

signals = tf.reshape(waveform, [1, -1])
signals.get_shape()

# Compute a [batch_size, ?, 128] tensor of fixed length, overlapping windows
# where each window overlaps the previous by 75% (frame_length - frame_step
# samples of overlap).
frames = tf.contrib.signal.frame(signals, frame_length=128, frame_step=32)
print(frames.numpy().shape)

# `magnitude_spectrograms` is a [batch_size, ?, 129] tensor of spectrograms. We
# would like to produce overlapping fixed-size spectrogram patches; for example,
# for use in a situation where a fixed size input is needed.
magnitude_spectrograms = tf.abs(tf.contrib.signal.stft(
    signals, frame_length=256, frame_step=64, fft_length=256))

print(magnitude_spectrograms.numpy().shape)

The method above is referring to https://colab.research.google.com/drive/1Adcy25HYC4c9uSBDK9q5_glR246m-TSx#scrollTo=QTa1BVSOw1Oe

Hope it can help you.

Thank you. I already have the `spectogram` from `scipy.signal.spectrogram`. I need to convert that to a tensor of `(n_timesteps, n_frequencies)` somehow — Shamoon, Feb 17 '20 at 13:53
I am trying to find a solution like that. Did anyone solve it? — Alankrit, Jun 23 '22 at 05:27

How can I convert spectrogram data to a tensor (or multidimensional numpy array)?

1 Answers1