I'm trying to detect with a short, mp3 jingle plays inside of a larger mp3 audio clip using Librosa. However, I'm having difficulty getting it to work, and I have no idea where to go next. This is the code that I have so far based off of this StackOverflow answer, though I am willing to detect the location of the jingle through another method or library.
# Load the audio as a waveform
# Store the sampling rate
JingleWave, JingleSR = librosa.load(short.mp3)
EpisodeWave, EpisodeSR = librosa.load(long.mp3)
# Power spectrograms of file
# I notice through debugging that the length of these arrays are the same
# despite them being very different file lengths
JingleSpectogram = np.abs(librosa.stft(JingleWave))
EpisodeSpectogram = np.abs(librosa.stft(EpisodeWave))
# Define binary structure for the footprint
# This is the part that is most likely to be faulty, as I most did it because
# maximum filter requires a footprint
structure = generate_binary_structure(2,1)
# Find local peaks to create constellation maps (2D images only containing peaks)
JingleCM = maximum_filter(JingleSpectogram, footprint=structure)
EpisodeCM = maximum_filter(EpisodeSpectogram, footprint=structure)
# Get time frames of the constellation maps
JingleLength = JingleCM.shape[0]
EpisodeLength = EpisodeCM.shape[0]
# Keep track of what segments match the most
scores = []
# Compare audio to find matching audio
for offset in range(EpisodeLength-JingleLength):
EpisodeExcerpt = EpisodeCM[offset:offset+JingleLength]
score = np.sum(np.multiple(EpisodeExcerpt,JingleCM))
scores[offset] = score
# Find when the highest score happens
highestScore = -1
for num in range(len(scores)):
if highestScore < num:
highestScore = num
# Convert score into the position of where the jingle starts
print(scores.index(highestScore))
print(highestScore)
I am just a beginner at programming so any help is much appreciated.