3

Context

I'm working on an audio classification problem and I want to recreate the spectrogram I get from librosa's built in plotting in grayscale.

The reason for doing this is to create images to pass to a neural network. Doing it with Matplotib is too slow, since it is designed for creating figures, not images.

I have scaled the amplitude using power_to_db(), but the frequency axis still needs to be scaled. With the built in display.specshow(), y_axis='log' I am able to replicate the desired result.

Question

How can I apply an equivalent operation to my spectrogram so the Y axis of my image looks like the one provided by librosa? Consider comparing librosa's spectrogram example and mine.

    def get_spectrogram_from_wav(wav: np.ndarray, sample_rate: int) -> np.ndarray:
        spec = np.abs(librosa.stft(wav))
        spec_db = librosa.amplitude_to_db(spec, ref=np.max)
        # log_spec = np.log10(spec_db)
        
        return spec_db

    def plot_slice(wav: np.ndarray):
        spec = np.abs(librosa.stft(wav))
        plt.figure()
        
        librosa.display.specshow(
            librosa.amplitude_to_db(spec, ref=np.max),
            x_axis='time', y_axis='log'
        )
        
        plt.title('Power spectrogram')
        plt.show()

I believe the right way to do this per Dorian's answer is to create a numpy meshgrid using np.logspace for the Y axis. I'm still not sure what the next step should be, but this is a start.

cbhower
  • 69
  • 5

0 Answers0