2

I am working on sound classification with wav files ranging from 1 second to 4 second. i want to convert wav to 224x224x3 image that i can fee into Resnet for classification The conversion should be using melspectogram Thanks for help

desertnaut
  • 57,590
  • 26
  • 140
  • 166

1 Answers1

1

You can use librosa to produce mel spectrogram like this:

import librosa
import librosa.display
import numpy as np
import matplotlib.pyplot as plt

y, sr = librosa.load(librosa.util.example_audio_file()) # your file
S = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=128, fmax=8000)
librosa.display.specshow(librosa.power_to_db(S, ref=np.max), fmax=8000)
plt.savefig('mel.png')

Mind though that these are false colours, RGB does not make sense here - nor any multi-channel. Use architecture that works with a single channel.

Lukasz Tracewski
  • 10,794
  • 3
  • 34
  • 53
  • This includes axis markers/labels etc from the plot into the file, which is not great as input to a ML model. Instead should save just the raw spectrogram data. An example of that in, https://stackoverflow.com/questions/56719138/how-can-i-save-a-librosa-spectrogram-plot-as-a-specific-sized-image/57204349#57204349 – Jon Nordby Jul 25 '19 at 14:32
  • I tried this, and I got AttributeError: module 'librosa' has no attribute 'display'. You need to explicitly import librosa.display now. – Octaviotastico Dec 12 '20 at 19:56
  • You also need to import numpy as np, because you're using it on line 6 – Octaviotastico Dec 12 '20 at 19:58
  • 1
    Thanks @Octaviotastico! Edited. – Lukasz Tracewski Dec 12 '20 at 20:16