4

I'm using the native beat_track function from Librosa:

from librosa.beat import beat_track
tempo, beat_frames = beat_track(audio, sampling_rate)

The original tempo of the song is at 146 BPM whereas the function approximates 73.5 BPM. While I understand 73.5*2 ~ 148 BPM, how can we achieve the following:

  1. Know when to scale up/down estimations
  2. Increase accuracy by pre-processing the signal
Chris
  • 1,206
  • 2
  • 15
  • 35
Akash Sonthalia
  • 362
  • 2
  • 12

1 Answers1

7

What you observe is the so-called "octave-error", i.e., the estimate is wrong by a factor of 2, 1/2, 3, or 1/3. It's a quite common problem in global tempo estimation. A great, classic introduction to global tempo estimation can be found in An Experimental Comparison of Audio Tempo Induction Algorithms. The article also introduces the common metrics Acc1 and Acc2.

Since the publication of that article, many researchers have tried to solve the octave-error problem. The (from my very biased point of view) most promising ones are A Single-Step Approach to Musical Tempo Estimation Using a Convolutional Neural Network by myself (you might also want to check out this later paper, which uses a simpler NN architecure) and Multi-Task Learning of Tempo and Beat: Learning One to Improve the Other by Böck et al.

Both approaches use convolutional neural networks (CNNs) to analyze the spectrograms. While a CNN could also be implemented in librosa, it currently is missing the programmatic infrastructure to easily do this. Another audio analysis framework seems to be a step ahead in this regard: Essentia. It is capable of running TensorFlow-models.

Hendrik
  • 5,085
  • 24
  • 56
  • 1
    Again, thank you for the clarification @Hendrik. I will dig on this further! – Akash Sonthalia May 06 '20 at 14:46
  • Hi, are there any other alternatives to Essentia? I've been trying for 3 days now, I'm just not being able to brew install all the dependencies, running into a lot of problems with OSX. – Akash Sonthalia May 08 '20 at 20:53
  • 1
    What is your goal? – Hendrik May 09 '20 at 09:10
  • I want to isolate instruments from a piece of music and extract the midi. My end goal is to try to build a system, that can do this for one particular artist and artists similar (clustered by instruments and effects) to the analysed artists. My manual analysis should be able to extract a key instrument interaction pattern of an artist from the audio sample. My first goal is to reproduce the same song treating small time slices of the individual instruments as time invariant systems. – Akash Sonthalia May 09 '20 at 13:24
  • Please feel free to advise, this was just a concept and I'm coming out with stuff along the way. Including visibility to DSP. I have no idea about it. Just learning. https://medium.com/@akashsonthalia/musician-instrument-analysis-touchpoint-2025-ai-powered-music-production-in-music-streaming-aec6324e1951 – Akash Sonthalia May 09 '20 at 13:31
  • 1
    This sounds like quite the project! Extracting midi from a sound file of polyphonic music is far from being trivial. – Hendrik May 10 '20 at 07:06
  • I do understand that my project is complex, however, I am enthusiastic about pursuing this. I don't think extraction is possible without supportive data. Instrumentation usage is the first step to understanding a song. If we know the raw sounds used, we have better hope of achieving something like this. I hope I'm able to convey my concept. – Akash Sonthalia May 10 '20 at 09:30
  • 1
    Check out [ISMIR](http://www.ismir.net) papers—they should cover most of what you need. For instrument recognition, e.g., http://archives.ismir.net/ismir2019/paper/000007.pdf or https://archives.ismir.net/ismir2017/paper/000085.pdf for transcription. Good luck! – Hendrik May 10 '20 at 10:18
  • 1
    Thank you @Hendrik, I will look into these sources. I hope to present one for ISMIR 2021 in Bengaluru. :) Thank you, I guess I'm going to need that good luck. – Akash Sonthalia May 10 '20 at 15:33