Context
I'm working on an audio classification problem and I want to recreate the spectrogram I get from librosa's
built in plotting in grayscale.
The reason for doing this is to create images to pass to a neural network. Doing it with Matplotib
is too slow, since it is designed for creating figures, not images.
I have scaled the amplitude using power_to_db()
, but the frequency axis still needs to be scaled. With the built in display.specshow()
, y_axis='log'
I am able to replicate the desired result.
Question
How can I apply an equivalent operation to my spectrogram so the Y axis
of my image looks like the one provided by librosa
? Consider comparing librosa's spectrogram example and mine.
def get_spectrogram_from_wav(wav: np.ndarray, sample_rate: int) -> np.ndarray:
spec = np.abs(librosa.stft(wav))
spec_db = librosa.amplitude_to_db(spec, ref=np.max)
# log_spec = np.log10(spec_db)
return spec_db
def plot_slice(wav: np.ndarray):
spec = np.abs(librosa.stft(wav))
plt.figure()
librosa.display.specshow(
librosa.amplitude_to_db(spec, ref=np.max),
x_axis='time', y_axis='log'
)
plt.title('Power spectrogram')
plt.show()
I believe the right way to do this per Dorian's answer is to create a numpy meshgrid
using np.logspace
for the Y axis
. I'm still not sure what the next step should be, but this is a start.