4

I am trying my hands at Audio Processing in python with this Beat Detection algorithm. I have implemented the first (non-optimized version) from the aforementioned article. While it prints some results, I have no way to detect whether it works with some accuracy or not as I do not know how to play sound with it.

Currently, I am using Popen to asynchronously start my media player with the song before going into the computation loop, but I am not sure if this strategy works and is giving synchronous results.

#!/usr/bin/python

import scipy.io.wavfile, numpy, sys, subprocess

# Some abstractions for computation
def sumsquared(arr):
    sum = 0
    for i in arr:
            sum = sum + (i[0] * i[0]) + (i[1] * i[1])

    return sum

if sys.argv.__len__() < 2:
    print 'USAGE: wavdsp <wavfile>'
    sys.exit(1)

numpy.set_printoptions(threshold='nan')
rate, data = scipy.io.wavfile.read(sys.argv[1])


# Beat detection algorithm begin 
# the algorithm has been implemented as per GameDev Article
# Initialisation
data_len = data.__len__()
idx = 0
hist_last = 44032
instant_energy = 0
local_energy = 0
le_multi = 0.023219955 # Local energy multiplier ~ 1024/44100


# Play the song
p = subprocess.Popen(['audacious', sys.argv[1]])

while idx < data_len - 48000:
    dat = data[idx:idx+1024]
    history = data[idx:hist_last]
    instant_energy = sumsquared(dat)
    local_energy = le_multi * sumsquared(history)
    print instant_energy, local_energy
    if instant_energy > (local_energy * 1.3):
            print 'Beat'

    idx = idx + 1024
    hist_last = hist_last + 1024 # Right shift history buffer

 p.terminate()

What modification/additions can I make to the script in order to get audio output and the algorithm (console) output in a time synchronised manner? i.e When console outputs result for a particular frame, that frame must be playing on the speakers.

Cœur
  • 37,241
  • 25
  • 195
  • 267
WeaklyTyped
  • 1,331
  • 4
  • 16
  • 31
  • You can rewrite `sumsquared` as one line: `return (arr**2).sum()`. This will push all of those computations down into C code and will probably be much faster. – John Vinyard Nov 07 '12 at 19:38

3 Answers3

5

Working beat detection code (NumPy / PyAudio)

If you are using NumPy this code might help. It assumes the signal (read with PyAudio) is 16-bit wide Int. If that is not the case change or remove the signal.astype() and adjust the normalization-divider (max int16 here).

class SimpleBeatDetection:
    """
    Simple beat detection algorithm from
    http://archive.gamedev.net/archive/reference/programming/features/beatdetection/index.html
    """
    def __init__(self, history = 43):
        self.local_energy = numpy.zeros(history) # a simple ring buffer
        self.local_energy_index = 0 # the index of the oldest element

    def detect_beat(self, signal):

        samples = signal.astype(numpy.int) # make room for squares
        # optimized sum of squares, i.e faster version of (samples**2).sum()
        instant_energy = numpy.dot(samples, samples) / float(0xffffffff) # normalize

        local_energy_average = self.local_energy.mean()
        local_energy_variance = self.local_energy.var()

        beat_sensibility = (-0.0025714 * local_energy_variance) + 1.15142857
        beat = instant_energy > beat_sensibility * local_energy_average

        self.local_energy[self.local_energy_index] = instant_energy
        self.local_energy_index -= 1
        if self.local_energy_index < 0:
            self.local_energy_index = len(self.local_energy) - 1

        return beat

The PyAudio examples for wav read or mic record will give you the needed signal data. Create a NumPy array efficiently with frombuffer()

data = stream.read(CHUNK)
signal = numpy.frombuffer(data, numpy.int16)
Stuart Axon
  • 1,844
  • 1
  • 26
  • 44
zany
  • 881
  • 1
  • 7
  • 16
  • Implemented this, and tested it by clapping in front of my microphone. False positive rate was very high. Maybe my expectations were too high and I could tweak it to improve, but just noting it isn't a robust, out-of-the-box solution. – Oddthinking Dec 06 '13 at 13:43
  • I know that this is an old question, but did anyone ever managed to improve this example? I have the same issues as @Oddthinking has. I tried to improve it and am still trying but without luck so far. – Peter Willemsen Feb 18 '16 at 19:45
1

A Simpler, Non-Realtime Approach

I'm not optimistic about synchronizing console output with realtime audio. My approach would be a bit simpler. As you read through the file and process it, write the samples out to a new audio file. Whenever a beat is detected, add some hard-to-miss sound, like a loud, short sine tone to the audio you're writing. That way, you can aurally evaluate the quality of the results.

Synthesize your beat indicator sound:

def testsignal(hz,seconds=5.,sr=44100.):
    '''
    Create a sine wave at hz for n seconds
    '''
    # cycles per sample
    cps = hz / sr
    # total samples
    ts = seconds * sr
    return np.sin(np.arange(0,ts*cps,cps) * (2*np.pi))

signal = testsignal(880,seconds = .02)

In your while loop, add the testsignal to the input frame if a beat is detected, and leave the frame unaltered if no beat is detected. Write those frames out to a file and listen to it to evaluate the quality of the beat detection.

This is the approach used by the aubio library to evaluate beat detection results. See the documentation here. Of particular interest is the documentation for the --output command line option:

Save results in this file. The file will be created on the model of the input file. Results are marked by a very short wood-block sample.

Optimization

Since numpy is already a dependency, use its capabilities to speed up your algorithm. You can rewrite your sumsquared function as:

def sumsquared(arr):
    return (arr**2).sum()

Getting rid of the Python for-loop and pushing those calculations down into C code should give you a speed improvement.

Also, take a look at this question or this question to get an idea of how you might vectorize the local to instantaneous energy comparisons in the while loop, using the numpy.lib.stride_tricks method.

Community
  • 1
  • 1
John Vinyard
  • 12,997
  • 3
  • 30
  • 43
0

A good bet would be to try portaudio (pyaudio) to get the data live, then you should be able to see if it matches.

Here's a nice example using fft from the mic with pyaudio:

http://www.swharden.com/blog/2010-03-05-realtime-fft-graph-of-audio-wav-file-or-microphone-input-with-python-scipy-and-wckgraph/

Stuart Axon
  • 1,844
  • 1
  • 26
  • 44