from librosa.feature import mfcc
from librosa.core import load
def extract_mfcc(sound):
data, frame = load(sound)
return mfcc(data, frame)
mfcc = extract_mfcc("sound.wav")
I would like to get the MFCC of the following sound.wav file which is 48 seconds long.
I understand that the data * frame = length of audio.
But when I compute the MFCC as shown above and get its shape, this is the result: (20, 2086)
What do those numbers represent? How can I calculate the time of the audio just by its MFCC?
I'm trying to calculate the average MFCC per ms of audio.
Any help is appreciated! Thank you :)