Getting max amplitude for an audio file per second

Question

I know there are some similar questions here, but most of them are concerning generating waveform images, which is not what I want.

My goal is to generate a waveform visualization for an audio file, similar to SoundCloud, but not an image. I'd like to have the max amplitude data for each second (or half second) of an audio clip in an array. I could then use this data to create a CSS-based visualization.

Ideally I'd like to get an array that has all the amplitude values for each second as a percentage of the maximum amplitude of the entire audio file. Here's an example:

[
    0.0,  # Relative max amplitude of first second of audio clip (0%)
    0.04,  # Relative max amplitude of second second of audio clip (4%)
    0.15,  # Relative max amplitude of third second of audio clip (15%)
    # Some more
    1.0,  # The highest amplitude of the whole audio clip will be 1.0 (100%)
]

I assume I'll have to use at least numpy and Python's wave module, but I'm not sure how to get the data I want. I'd like to use Python but I'm not completely against using some kind of command-line tool.

The file type will be any format that is required, since I'm using ffmpeg to convert the lossless file anyway. I just used wav as an example because I figured that would be the easiest to work with. — Liam, Feb 18 '12 at 23:58
would you allow gstreamer? i've been working with it for getting peak of an audio stream http://txzone.net/2011/05/using-microphone-peak-as-input/, can be really easy to do the same for any file — tito, Feb 18 '12 at 23:59

score 4 · Accepted Answer · answered Feb 19 '12 at 00:39

If you allow gstreamer, here is a little script that could do the trick. It accept any audio file that gstreamer can handle.

Construct a gstreamer pipeline, use audioconvert to reduce the channels to 1, and use level module to get peaks
Run the pipeline until EOS is hit
Normalize the peaks from the min/max found.

Snippet:

import os, sys, pygst
pygst.require('0.10')
import gst, gobject
gobject.threads_init()

def get_peaks(filename):
    global do_run

    pipeline_txt = (
        'filesrc location="%s" ! decodebin ! audioconvert ! '
        'audio/x-raw-int,channels=1,rate=44100,endianness=1234,'
        'width=32,depth=32,signed=(bool)True !'
        'level name=level interval=1000000000 !'
        'fakesink' % filename)
    pipeline = gst.parse_launch(pipeline_txt)

    level = pipeline.get_by_name('level')
    bus = pipeline.get_bus()
    bus.add_signal_watch()

    peaks = []
    do_run = True

    def show_peak(bus, message):
        global do_run
        if message.type == gst.MESSAGE_EOS:
            pipeline.set_state(gst.STATE_NULL)
            do_run = False
            return
        # filter only on level messages
        if message.src is not level or \
           not message.structure.has_key('peak'):
            return
        peaks.append(message.structure['peak'][0])

    # connect the callback
    bus.connect('message', show_peak)

    # run the pipeline until we got eos
    pipeline.set_state(gst.STATE_PLAYING)
    ctx = gobject.gobject.main_context_default()
    while ctx and do_run:
        ctx.iteration()

    return peaks

def normalize(peaks):
    _min = min(peaks)
    _max = max(peaks)
    d = _max - _min
    return [(x - _min) / d for x in peaks]

if __name__ == '__main__':
    filename = os.path.realpath(sys.argv[1])
    peaks = get_peaks(filename)

    print 'Sample is %d seconds' % len(peaks)
    print 'Minimum is', min(peaks)
    print 'Maximum is', max(peaks)

    peaks = normalize(peaks)
    print peaks

And one output example:

$ python gstreamerpeak.py 01\ Tron\ Legacy\ Track\ 1.mp3 
Sample is 182 seconds
Minimum is -349.999999922
Maximum is -2.10678956719
[0.0, 0.0, 0.9274581631597019, 0.9528318436488018, 0.9492396611762614,
0.9523404330322813, 0.9471685835966183, 0.9537281219301242, 0.9473486577135167,
0.9479292126411365, 0.9538221105563514, 0.9483845795252251, 0.9536790832823281,
0.9477264933378022, 0.9480077366961968, ...

I've been trying to get your script working but I've been running into issues. When I run the pipeline text in the terminal with `gst-launch`, I get this: `ERROR: from element /GstPipeline:pipeline0/GstDecodeBin:decodebin0/GstMpegAudioParse:mpegaudioparse0: GStreamer encountered a general stream error. Additional debug info: gstbaseparse.c(2695): gst_base_parse_loop (): /GstPipeline:pipeline0/GstDecodeBin:decodebin0/GstMpegAudioParse:mpegaudioparse0: streaming stopped, reason not-linked ERROR: pipeline doesn't want to preroll.` Any idea what the issue could be? — Liam, Feb 19 '12 at 22:59
You can try to add "print message" in the `show_peak()` method at the start, you might get more information. Or run with `GST_DEBUG=*:5` (care, lot lot lot of verbose) — tito, Feb 20 '12 at 00:05
I'm still having issues getting this working but I suspect it may have to do with my gstreamer installation and not your script. I'll go ahead and accept your answer for now. Thanks for the help! — Liam, Feb 22 '12 at 18:30
Just FYI, I finally was able to compile properly gstreamer on my machine, and I was able to run your script successfully! Thanks so much. — Liam, Mar 16 '12 at 18:07

Getting max amplitude for an audio file per second

1 Answers1

Linked