Understanding super fast blur algorithm

Question

I'm trying to understand the algorithm behind the super fast blur algorithm. Below is the port to java that works with android as a test. Looks like this version makes some optimisations that I don't quite understand and there aren't any comments either.

void fastblur(Bitmap img, int radius){

    if (radius<1){
        return;
    }
    int w= img.getWidth();
    int h=img.getHeight();
    int wm=w-1;
    int hm=h-1;
    int wh=w*h;
    int div=radius+radius+1;
    int r[]=new int[wh];
    int g[]=new int[wh];
    int b[]=new int[wh];
    int rsum,gsum,bsum,x,y,i,p,p1,p2,yp,yi,yw;
    int vmin[] = new int[Math.max(w,h)];
    int vmax[] = new int[Math.max(w,h)];
    int[] pix= new  int[w*h];

    img.getPixels(pix, 0, w, 0,0,w, h);

    int dv[]=new int[256*div];
    for (i=0;i<256*div;i++){
        dv[i]=(i/div);
    }

    yw=yi=0;

    for (y=0;y<h;y++){
        rsum=gsum=bsum=0;
        for(i=-radius;i<=radius;i++){
            p=pix[yi+Math.min(wm,Math.max(i,0))];
            rsum+=(p & 0xff0000)>>16;
            gsum+=(p & 0x00ff00)>>8;
            bsum+= p & 0x0000ff;
        }
        for (x=0;x<w;x++){

            r[yi]=dv[rsum];
            g[yi]=dv[gsum];
            b[yi]=dv[bsum];

            if(y==0){
                vmin[x]=Math.min(x+radius+1,wm);
                vmax[x]=Math.max(x-radius,0);
            }
            p1=pix[yw+vmin[x]];
            p2=pix[yw+vmax[x]];

            rsum+=((p1 & 0xff0000)-(p2 & 0xff0000))>>16;
            gsum+=((p1 & 0x00ff00)-(p2 & 0x00ff00))>>8;
            bsum+= (p1 & 0x0000ff)-(p2 & 0x0000ff);
            yi++;
        }
        yw+=w;
    }

    for (x=0;x<w;x++){
        rsum=gsum=bsum=0;
        yp=-radius*w;
        for(i=-radius;i<=radius;i++){
            yi=Math.max(0,yp)+x;
            rsum+=r[yi];
            gsum+=g[yi];
            bsum+=b[yi];
            yp+=w;
        }
        yi=x;
        for (y=0;y<h;y++){
            pix[yi]=0xff000000 | (dv[rsum]<<16) | (dv[gsum]<<8) | dv[bsum];
            if(x==0){
                vmin[y]=Math.min(y+radius+1,hm)*w;
                vmax[y]=Math.max(y-radius,0)*w;
            }
            p1=x+vmin[y];
            p2=x+vmax[y];

            rsum+=r[p1]-r[p2];
            gsum+=g[p1]-g[p2];
            bsum+=b[p1]-b[p2];

            yi+=w;
        }
    }

    img.setPixels(pix,0, w,0,0,w,h);
}

Correct me if I'm wrong by my speculations:

What does the below loop do? Is it associated with pre-computing the kernel table? What about div, is that the kernel table size? I guess what I'm trying to ask is, what is dv[] supposed to store?

int dv[]=new int[256*div];
for (i=0;i<256*div;i++){
    dv[i]=(i/div);
}

Looking at the horizontal pass: The below loop looks like it's summing up the separate RGB values, but it only does this at the starting pixel for each row, since yi is only incremented once we finish processing all pixels up until the width is reached. Is this because we end up adding to the RGB sums as we process the pixels in the next loop?

        for(i=-radius;i<=radius;i++){
            int ind = yi+Math.min(wm,Math.max(i,0));
            p=pix[ind];
            rsum+=(p & 0xff0000)>>16;
            gsum+=(p & 0x00ff00)>>8;
            bsum+= p & 0x0000ff;
        }

Are we only selecting the left most pixel and right most pixel according to the radius and the current pixel position?

 if(y==0){
   vmin[x]=Math.min(x+radius+1,wm);
   vmax[x]=Math.max(x-radius,0);
  } 

  p1=pix[yw+vmin[x]];
  p2=pix[yw+vmax[x]];

Next is what is confusing me the most: Am I correct to say that were getting the difference between right and left pixels and adding that the running RGB totals that we have?

  rsum+=((p1 & 0xff0000)-(p2 & 0xff0000))>>16;
  gsum+=((p1 & 0x00ff00)-(p2 & 0x00ff00))>>8;
  bsum+= (p1 & 0x0000ff)-(p2 & 0x0000ff);

I haven't had a look at the second pass since this is pretty much going over my head. Any clarification would be appreciated and any commentary on the loop on the vertical pass would be helpful as well thanks.

I think the same algorithm is described more properly here http://blog.ivank.net/fastest-gaussian-blur.html — Ivan Kuckir, Aug 29 '18 at 10:41

score 23 · Accepted Answer · answered Mar 07 '14 at 11:27

Since I wrote that one I guess I can explain best :-)

 int dv[]=new int[256*div]; 
 for (i=0;i<256*div;i++){
     dv[i]=(i/div); 
}

This line precalculates a lookup table for all the possible mean values that can occur. This is to avoid costly division in the inner loop. On some systems doing the division directly instead of a doing an array lookup might actually be faster nowadays, but when I wrote it the lookup was the faster way.

for(i=-radius;i<=radius;i++){
            int ind = yi+Math.min(wm,Math.max(i,0));
            p=pix[ind];
            rsum+=(p & 0xff0000)>>16;
            gsum+=(p & 0x00ff00)>>8;
            bsum+= p & 0x0000ff;
        }

The reason why this algorithm is fast is that it uses a sliding window and thus reduces the number of required pixel lookups. The window slides from the left edge to the right (and in the second pass from top to bottom) and only adds one pixel at the right and removes one from the left. The code above initializes the window by prefilling the window with the leftmost edge pixel depending on the kernel size.

 if(y==0){
   vmin[x]=Math.min(x+radius+1,wm);
   vmax[x]=Math.max(x-radius,0);
  } 

  p1=pix[yw+vmin[x]];
  p2=pix[yw+vmax[x]];

This is the part that adds a new pixel but at the same time handles the border conditions (when the window tries to read or remove pixels outside the bitmap).

 rsum+=((p1 & 0xff0000)-(p2 & 0xff0000))>>16;
  gsum+=((p1 & 0x00ff00)-(p2 & 0x00ff00))>>8;
  bsum+= (p1 & 0x0000ff)-(p2 & 0x0000ff);

rsum, gsum and bsum is the accumulated sum of pixels inside the sliding window. What you see is the new pixel on the right side being added to the sum and the leftmost pixel i nthe window being removed from the sum.

I've just had a chance to really read through things, so apologies for not accepting earlier but this really helped. — Sun, Apr 30 '14 at 09:05
Hi, Quasimondo, I have tried your fastBlur algorithm, and it works great, but it crashes on the Genymotion Emulators, with specific devices, don't know why. The crash exception is java.lang.ArrayIndexOutOfBoundsException: length=65536; index=65889 Do you have any idea? Thanks — David_Wang, Sep 14 '15 at 05:28
That looks to me like the maximum allowed array size on these emulators is 65536, so when you pass in images bigger than that StackBlur runs out of memory. The only way around that would be to do the blur in tiles, though the tiles would also need to be padded to not get artifacts at the edges. — Quasimondo, Sep 27 '15 at 11:24

r3mainer · Answer 2 · 2014-01-29T00:39:07.820

This box blur algorithm is outlined in this paper from 2001.

What it's basically doing is blurring the image twice; first in the horizontal direction, and then in the vertical direction. The end result is the same as if you had calculated the convolution of the image with a square box 2r+1 pixels across (i.e., from x-r to x+r, and from y-r to y+r at each point).

AT each step, the blurred pixel value is simply the average of all the pixels in this range. This can be calculated quickly by keeping a running total at each point. When you move the range to the right (down) one pixel, you subtract the pixel at the left (top) end and add the pixel at the right (bottom) end. You still have to divide these running totals by 2r+1, but this can be sped up by precomputing fixed-point values of n/(2r+1) for (0≤n<256) and storing them in dv[] (with an 8-bit fractional part).

The short summation loops at the start of each scan are just there to calculate the initial values of the running total.

And with a bit of juggling with max() and min() to avoid accessing out-of-range pixels, that's about all there is to it.

score 0 · Answer 3 · answered Apr 01 '14 at 13:57

Hints when using CompoundBlur

You will notice from the gradient tables that the blur is going to build from the outside inward, so it will blur the edges first and then blur the center. In order to blur from the center towards the edges just take all the values in the mul_table and subtract 255 from them: This inverts the bitmap- you can imagine the brightness of a pixel in your gradient map is equivalent to the blur radius used there - white pixel big blur, black pixel small blur.

Method for Quick Inverting:

Using Sublime Text and Microsoft Excel you can easily invert the values...

Sublime Text:

Get all the values into columns with the commas lined up vertically, then by clicking and dragging with the mousewheel you can select downward and hit enter to put a single number on a single line. Now click and drag with mousewheel again and insert a "- 255" after every value, and a "=" before every value (Also click and drag to select all commas and delete them). Now select all lines and copy.

Final format for Excel should be: = (original mul_table value) - 255 ... i.e. = 512 - 255

Excel: After copying formatted values in Sublime, paste to the top-left most cell in Excel, and Excel will evaluate "=512-255" for you and instantly create new inverted values. Copy all cells and paste back into your js file and insert commas back in.

Your CompoundBlur will now blur from the center towards the edges..

Understanding super fast blur algorithm

3 Answers3

Linked