0

I need to find the energy of peaks using Librosa so I can detect the first beat of each bar.

I am using Librosa to detect audio beats in a click track. This is working well, but I now wish to detect the first beat of every bar. I believe the best way to do this would be to detect the energy or the pitch of each beat.

Currently I am logging all beats to an array. How can I detect the first beat of each bar?

def findPeaks(inputFile):
    print(">>> Finding peaks...\n")
    y, sr = librosa.load(inputFile)
    onset_env = librosa.onset.onset_strength(
        y=y, sr=sr, hop_length=512, aggregate=np.median
    )
    global inputTrackPeaks  # array of peaks
    inputTrackPeaks = librosa.util.peak_pick(onset_env, 3, 3, 3, 5, 0.5, 10)
    inputTrackPeaks = librosa.frames_to_time(inputTrackPeaks, sr=sr)
    inputTrackPeaks = inputTrackPeaks * 1000  # convert array to milliseconds
    print("Peak positions (ms): \n", inputTrackPeaks)
Stephen Kempin
  • 113
  • 2
  • 16

1 Answers1

3

For a very simple beat tracker you probably want to use librosa's built-in beat tracking:

import librosa

y, sr = librosa.load(librosa.util.example_audio_file())
tempo, beats = librosa.beat.beat_track(y=y, sr=sr)
# beats now contains the beat *frame positions*
# convert to timestamps like this:
beat_times = librosa.frames_to_time(beats, sr=sr)

That gives you the beat positions. But you actually have been asking for downbeat estimation. Your idea to find the beat with the highest energy is good, but you might want to incorporate some additional knowledge and average over corresponding beats. E.g., if you know the track is in 4/4 time, you could sum up the energy of every fourth beat and then conclude that the beat position with the highest energy sum is the downbeat.

Roughly like this:

import librosa
import numpy as np

y, sr = librosa.load('my file.wav')
# get onset envelope
onset_env = librosa.onset.onset_strength(y, sr=sr, aggregate=np.median)
# get tempo and beats
tempo, beats = librosa.beat.beat_track(onset_envelope=onset_env, sr=sr)
# we assume 4/4 time
meter = 4
# calculate number of full measures 
measures = (len(beats) // meter)
# get onset strengths for the known beat positions
# Note: this is somewhat naive, as the main strength may be *around*
#       rather than *on* the detected beat position. 
beat_strengths = onset_env[beats]
# make sure we only consider full measures
# and convert to 2d array with indices for measure and beatpos
measure_beat_strengths = beat_strengths[:measures * meter].reshape(-1, meter)
# add up strengths per beat position
beat_pos_strength = np.sum(measure_beat_strengths, axis=0)
# find the beat position with max strength
downbeat_pos = np.argmax(beat_pos_strength)
# convert the beat positions to the same 2d measure format
full_measure_beats = beats[:measures * meter].reshape(-1, meter)
# and select the beat position we want: downbeat_pos
downbeat_frames = full_measure_beats[:, downbeat_pos]
print('Downbeat frames: {}'.format(downbeat_frames))
# print times
downbeat_times = librosa.frames_to_time(downbeat_frames, sr=sr)
print('Downbeat times in s: {}'.format(downbeat_times))

Your mileage with code like this will vary. Success depends on the kind of music, genre, meter, quality of beat detection, etc. That's because it's not trivial. In fact, downbeat estimation is a current Music Information Retrieval (MIR) research topic and not entirely solved. For a recent review of advanced deep learning-based automatic downbeat tracking you might want to check out this article.

Hendrik
  • 5,085
  • 24
  • 56
  • Thanks, this is great. An issue however is that not all the click tracks I am using are in constant 4/4. Some have a bar of only 2/4. I should point out that the downbeat on the click tracks I am using as a source file have a different tone to the rest of the bars. Therefore would there be a method to detect the downbeat via the pitch rather than energy? – Stephen Kempin Aug 07 '19 at 14:28
  • If your click track only consists of two different kind of pitches/clicks and nothing else, why not simply (mis-)use [librosa.core.piptrack](https://librosa.github.io/librosa/generated/librosa.core.piptrack.html), identify and classify the peaks into downbeat or not. If the "click" is harmonic, you should be able to measure which pitch to look for as the downbeat. See also https://stackoverflow.com/q/43877971/942774 and accepted answer for picking the max pitch. – Hendrik Aug 07 '19 at 15:06