Correct way to draw zoomable audio waveform

Question

I am trying to implement smooth zoomable audio waveform but am puzzled with the correct approach to implement zoom. I searched internet but there is very little or no information.

So here is what I have done:

Read audio samples from file and compute waveform points with samplesPerPixel = 10, 20, 40, 80, ....,10240. Store the datapoints for each scale (11 in total here). Max and min are also stored along with points for each samplesPerPixel.
When zooming, switch to the closest dataset. So if samplesPerPixel at current width is 70, then use dataset corresponding to samplesPerPixel = 80. The correct dataset index is easily found using log2(samplesPerPixel).
Use subsampling of the dataset to draw waveform points. So if we samplesPerPixel = 41 and we are using data set for zoom 80, then we use the scaling factor 80/41 to subsample.

let scaleFactor = 80.0/41.0 x = waveformPointX[i*scaleFactor]

I am yet to find a better approach and not too sure if the above approach of subsampling is correct, but for sure this approach consumes lot of memory and also is slow to load data at the start. How do audio editors implement zooming in waveform, is there an efficient approach?

EDIT: Here is a code for computing mipmaps.

   public class WaveformAudioSample {
     var samplesPerPixel:Int = 0
     var totalSamples:Int = 0
     var samples: [CGFloat] = []
     var sampleMax: CGFloat = 0
   }

   private func downSample(_ waveformSample:WaveformAudioSample, factor:Int) {
    NSLog("Averaging samples")
   
    var downSampledAudioSamples:WaveformAudioSample = WaveformAudioSample()
    downSampledAudioSamples.samples = [CGFloat](repeating: 0, count: waveformSample.samples.count/factor)
    downSampledAudioSamples.samplesPerPixel = waveformSample.samplesPerPixel * factor
    downSampledAudioSamples.totalSamples = waveformSample.totalSamples
    
    for i in 0..<waveformSample.samples.count/factor {
        var total:CGFloat = 0
        for j in 0..<factor {
            total = total + waveformSample.samples[i*factor + j]
        }
        let averagedSample = total/CGFloat(factor)
        downSampledAudioSamples.samples[i] = averagedSample
    }
    
    NSLog("Averaged samples")
}

Interesting +1 So you basicaly do a 1D Mipmap (on 2D images is this also called Laplace pyramid ... in your case triangle) The idea is not to use `10/20/40/80` samples per pixel but `1/2/4/8/16/32/64` samples per pixel and compute the mipmaps from the previous one instead of from whole data that should give you huge speed boost ... and to obtain renderable pixel you just bilinearly (2D images do this trilinearly) interpolate between 2 closest resolution. If you encode your data right you might even use GPU for this as the HW is designed to do this fast ... for example using OpenGL even GLSL — Spektre, Mar 10 '22 at 08:52
Yes, but the problem is computing mipmaps is very computationally expensive and takes time for loading (~ 7 seconds on iPhone 13 pro for 10 minute aac audio for 11 mipmaps). There needs to be an alternative fast way as I see many apps doing all this quickly. — Deepak Sharma, Mar 10 '22 at 09:07
show code how you computing this ... I suspect something fishy there — Spektre, Mar 10 '22 at 09:09
I tried computing from previous data as well. Issue is user can quickly zoom across scales in a matter of 200 ms. And it takes roughly 300 ms to compute mipmaps from previous level. — Deepak Sharma, Mar 10 '22 at 09:09
@Spektre Ok I will post some code. But it is straight forward swift code. — Deepak Sharma, Mar 10 '22 at 09:09
but computing bands as power of 2 is just halving so no expensive instructions — Spektre, Mar 10 '22 at 09:10
I think it's got to do with number of samples to process. Also, setting samplesPerPixel to 1 increased load time to 3 sec. If samplesPerPixel is 10, it is just 1 sec. Thats something to do with platform I guess. — Deepak Sharma, Mar 10 '22 at 09:21
not familiar with the language but from a quick look I see: that code is mixing `int` and `float` operations, doing a lot of unnecessary index computation using `*` instead of simple `+` iteration and not using previous mipmap and not halving (so you have 2 nested loops instead of single one) so no brainer it is slow... the data is PCM (time or frequency domain)? as `float` or `int` or fixed point? do you have FPU or it is emulated in SW? — Spektre, Mar 10 '22 at 10:00
Was curious so I tried to encode the bilinear interpolation ... have added it to my answer — Spektre, Mar 11 '22 at 07:48

Spektre · Answer 1 · 2022-03-11T07:56:55.977

You should use power of 2 size of your data

This will allow you to use just cheap bit shifts and simple resizing without any costly floating point operations or integer multiplicatin and division.
You should do half resolution mipmaps using previous mipmap

This will always create one sample from 2 samples of previous mipmap so no nested for loops or costly index computations
Do not mix floating and integer computations if you can avoid it

even if you have FPU the conversion between int and float is usually very slow. Ideally keep your audio data in integer format...

Here small C++/VCL example of these ideas:

//$$---- Form CPP ----
//---------------------------------------------------------------------------
#include <vcl.h>
#include <math.h>
#pragma hdrstop
#include "win_main.h"
//---------------------------------------------------------------------------
#pragma package(smart_init)
#pragma resource "*.dfm"
TForm1 *Form1;
//---------------------------------------------------------------------------
//---------------------------------------------------------------------------
int xs,ys;              // screen resolution
Graphics::TBitmap *bmp; // back buffer bitmap for rendering
//---------------------------------------------------------------------------
// input data
const int samples=1024;
int sample[samples];
// mipmas max 32 resolutions -> 2^32 samples input
int *mmdat0[32]={NULL}, // min
    *mmdat1[32]={NULL}, // max
     mmsiz[32]={0};     // resolution
//---------------------------------------------------------------------------
void generate_input(int *data,int size)
    {
    int i; float a,da;
    da=10.0*M_PI/float(size-1);
    for (a=0.0,i=0;i<size;i++,a+=da)
        {
        data[i]=float(100.0*sin(a))+Random(40)-20;
        }
    }
//---------------------------------------------------------------------------
void mipmap_free()
    {
    // free allocated mipmaps if needed
    if (mmdat0[0]) delete[] mmdat0[0];
    mmdat0[0]=NULL;
    mmdat1[0]=NULL;
    mmsiz[0]=0;
    }
//---------------------------------------------------------------------------
void mipmap_compute(int *data,int size)
    {
    int i,j,k,n,N,a,a0,a1;
    mipmap_free();
    for (N=0,n=size;n;N+=n,n>>=1);  // compute siz of all mipmas together
    mmdat0[0]=new int[N+N];         // allocate space for all mipmas as single 1D array
    mmdat1[0]=mmdat0[0]+N;          // max will be at the other half
    mmsiz [0]=size;
    for (i=1,n=size;n;n>>=1,i++)    // and just set pointers of sub mipmas
        {
        mmdat0[i]=mmdat0[i-1]+n;    // to point at the the right place
        mmdat1[i]=mmdat1[i-1]+n;    // to point at the the right place
        mmsiz [i]=mmsiz [i-1]>>1;   // and set resolution as half
        }
    // copy first mipmap
    n=size;
    for (i=0;i<mmsiz[0];i++)
        {
        a=data[i];
        mmdat0[0][i]=a;
        mmdat1[0][i]=a;
        }
    // process all resolutions
    for (k=1;mmsiz[k];k++)
        {
        // halve resolution
        for (i=0,j=0;i<mmsiz[k];i++)
            {
            a=mmdat0[k-1][j];                a0=a;
            a=mmdat1[k-1][j]; j++;           a1=a;
            a=mmdat0[k-1][j];      if (a0>a) a0=a;
            a=mmdat1[k-1][j]; j++; if (a1<a) a1=a;
            mmdat0[k][i]=a0;
            mmdat1[k][i]=a1;
            }
        }
    }
//---------------------------------------------------------------------------
void draw() // just render of my App
    {
    bmp->Canvas->Brush->Color=clWhite;
    bmp->Canvas->FillRect(TRect(0,0,xs,ys));

    int ix,x,y,y0=ys>>1;

    // plot input data
    bmp->Canvas->Pen->Color=clBlack;
    x=0; y=y0-sample[x];
    bmp->Canvas->MoveTo(x,y);
    for (x=1;x<xs;x++)
        {
        y=y0-sample[x];
        bmp->Canvas->LineTo(x,y);
        }

    // plot mipmap[ix] input data
    ix=1;
    bmp->Canvas->Pen->Color=clBlue;
    x=0; y=y0-sample[x];
    bmp->Canvas->MoveTo(x,y);
    for (x=0;x<mmsiz[ix];x++)
        {
        y=y0-mmdat0[ix][x];
        bmp->Canvas->LineTo(x,y);
        y=y0-mmdat1[ix][x];
        bmp->Canvas->LineTo(x,y);
        }

    Form1->Canvas->Draw(0,0,bmp);
//  bmp->SaveToFile("out.bmp");
    }
//---------------------------------------------------------------------------
__fastcall TForm1::TForm1(TComponent* Owner):TForm(Owner) // init of my app
    {
    // init backbuffer
    bmp=new Graphics::TBitmap;
    bmp->HandleType=bmDIB;
    bmp->PixelFormat=pf32bit;

    generate_input(sample,samples);
    mipmap_compute(sample,samples);
    }
//---------------------------------------------------------------------------
void __fastcall TForm1::FormDestroy(TObject *Sender) // not important just destructor of my App
    {
    mipmap_free();
    delete bmp;
    }
//---------------------------------------------------------------------------
void __fastcall TForm1::FormResize(TObject *Sender) // not important just resize event
    {
    xs=ClientWidth;
    ys=ClientHeight;
    bmp->Width=xs;
    bmp->Height=ys;
    draw();
    }
//-------------------------------------------------------------------------
void __fastcall TForm1::FormPaint(TObject *Sender) // not important just repaint event
    {
    draw();
    }
//---------------------------------------------------------------------------

Ignore the window VCL and rendering related stuff (I just wanted to pass whole source so you can see how it is used). The important is only the function mipmap_compute which converts your input data to 2 mipmaps. One is holding min values and the other max values.

The dynamic allocatins are not important the only important code chunk is marked with comment:

// process all resolutions

Where for each mipmap there is only single for loop without any expensive operations. If your platform is better with branchless code you can compute the min,max using in-build brunchless functions min,max. Something like:

// process all resolutions
for (k=1;mmsiz[k];k++)
    {
    // halve resolution
    for (i=0,j=0;i<mmsiz[k];i++)
        {
        a=mmdat0[k-1][j];      a0=a;
        a=mmdat1[k-1][j]; j++; a1=a;
        a=mmdat0[k-1][j];      a0=min(a0,a);
        a=mmdat1[k-1][j]; j++; a1=max(a1,a);
        mmdat0[k][i]=a0;
        mmdat1[k][i]=a1;
        }
    }

This can be further optimized simply by using pointer to actually selected mipmaps that will get rid of the [k] and [k-1] indexes allowing one less memory access per each element access.

// process all resolutions
for (k=1;mmsiz[k];k++)
    {
    // halve resolution
    int *p0=mmdat0[k-1];
    int *p1=mmdat1[k-1];
    int *q0=mmdat0[k];
    int *q1=mmdat1[k];
    for (i=0,j=0;i<mmsiz[k];i++)
        {
        a=p0[j];      a0=a;
        a=p1[j]; j++; a1=a;
        a=p0[j];      a0=min(a0,a);
        a=p1[j]; j++; a1=max(a1,a);
        q0[i]=a0;
        q1[i]=a1;
        }
    }

Now all you need is to bilinearly interpolate between 2 mipmaps to achieve your resolution, here small example for this:

// actually rescaled output
int out0[samples];      // min
int out1[samples];      // max
int outs=0;             // size
void resize(int n)  // compute out0[n],out1[n] from mipmaps
    {
    int i,*p0,*p1,*q0,*q1,pn,qn;
    int pc,qc,pd,qd,pi,qi;
    int a,a0,a1,b0,b1,bm,bd;
    for (i=0;mmsiz[i]>=n;i++);  // find smaller resolution
    pn=mmsiz[i];
    p0=mmdat0[i];
    p1=mmdat1[i]; i--;
    qn=mmsiz[i];                // bigger or equal resolution
    q0=mmdat0[i];
    q1=mmdat1[i]; outs=n;
    pc=0; pi=0;
    qc=0; qi=0;
    bm=n-pn; bd=qn-pn;
    for (i=0;i<n-1;i++)
        {
        // bilinear interpolation (3x linear)
        a0=q0[qi];
        a1=q0[qi+1];
        b1=a0+(((a1-a0)*qc)/n);
        a0=p0[pi];
        a1=p0[pi+1];
        b0=a0+(((a1-a0)*pc)/n);
        out0[i]=b0+(((b1-b0)*bm)/bd);           // /bd might be bitshift right by log2(bd)
        // bilinear interpolation (3x linear)
        a0=q1[qi];
        a1=q1[qi+1];
        b1=a0+(((a1-a0)*qc)/n);
        a0=p1[pi];
        a1=p1[pi+1];
        b0=a0+(((a1-a0)*pc)/n);
        out1[i]=b0+(((b1-b0)*bm)/bd);           // /bd might be bitshift right by log2(bd)
        // DDA increment indexes
        pc+=pn; while (pc>=n){ pi++; pc-=n; }   // pi = (i*pn)/n
        qc+=qn; while (qc>=n){ qi++; qc-=n; }   // qi = (i*qn)/n
        }
    out0[n-1]=q0[pn-1];
    out1[n-1]=q1[pn-1];
    }

Beware target size n must be less or equal to highest mipmap resolution...

This is how it looks (when I change the resolution manually with mouse wheel):

The choppyness is caused by GIF grabber ... the scaling is fast and seamless in real.

score 1 · Answer 2 · answered Nov 06 '22 at 13:12

I had a similar problem, with 1.800.000 points of a waveform to draw on an 800 points screen. The zoom factor was 2000. If someone is interested, that's how I got awesome results :

Divide the very long list into 400 smaller lists
For each smaller list calculate the biggest difference, between the smaller and larger value in that list.
Plot 2 points per list, one at (offset + delta / 2) and one at (offset - delta / 2)

Results : from 453932 points to 800 points

Python code :

numberOfSmallerList = 400
small_list_len = int(len(big_list) / numberOfSmallerList)

finalPointsToPlot = []

for i in range(0, len(big_list), small_list_len):
    biggestDiff = max(big_list[i:i+small_list_len]) - 
    min(big_list[i:i+small_list_len])

    finalPointsToPlot.append(biggestDiff/2 + 100)
    finalPointsToPlot.append(100 - biggestDiff/2)


import matplotlib.pyplot as plt
plt.plot(finalPointsToPlot)
plt.show()

Correct way to draw zoomable audio waveform

2 Answers2

Linked