Algorithm for fast Drop shadow in GDI+

Question

What is an efficient way to add a drop-shadow to an image in GDI?

Right now i start with my image:

enter image description here

i use ImageAttributes and a ColorMatrix to draw the image's alpha mask to a new image:

colorMatrix = (
    ( 0,  0,  0, 0, 0),
    ( 0,  0,  0, 0, 0),
    ( 0,  0,  0, 0, 0),
    (-1, -1, -1, 1, 0),
    ( 1,  1,  1, 0, 1)
    );

enter image description here

i then apply a Gaussian Blur convolution kernel, and offset it slightly:

enter image description here

And then i draw my original image back over top:

enter image description here

Problem is that it's just too slow, it takes about 170ms to generate the image with drop-shadow, verses 2ms without the drop-shadow (70x slower):

with drop shadow: 171,332 µs
without drop shadow: 2,457us

When a user (e.g. me) is scrolling through a list of items, that extra 169ms delay is very noticeable.

You can ignore the code below, it doesn't add anything to the question, or the answer:

class function TImageEffects.GenerateDropShadow(image: TGPImage;
        const radius: Single; const OffsetX, OffsetY: Single; const Opacity: Single): TGPBitmap;
var
    width, height: Integer;
    alphaMask: TGPBitmap;
    shadow: TGPBitmap;
    graphics: TGPGraphics;
    imageAttributes: TGPImageAttributes;
    cm: TColorMatrix;
begin
{
    We generate a drop shadow by first getting the alpha mask. This will be a black
    sillouette on a transparent background. We then blur the black "shadow" by the amounts
    given.
    We then draw the original image on top of it's own shadow.
}

{
    http://msdn.microsoft.com/en-us/library/aa511280.aspx
    Windows Vista User Experience -> Guidelines -> Aesthetics -> Icons

    Basic Flat Icon Shadow Ranges

    Flat icons
    Flat icons are generally used for file icons and flat real-world objects,
    such as a document or a piece of paper.

    Flat icon lighting comes from the upper-left at 130 degrees.

    Smaller icons (for example, 16x16 and 32x32) are simplified for readability.
    However, if they contain a reflection within the icon (often simplified),
    they may have a tight drop shadow. The drop shadow ranges in opacity from
    30-50 percent.
    Layer effects can be used for flat icons, but should be compared with other
    flat icons. The shadows for objects will vary somewhat, according to what
    looks best and is most consistent within the size set and with the other
    icons in Windows Vista. On some occasions, it may even be necessary to
    modify the shadows. This will especially be true when objects are laid over
    others.
    A subtle range of colors may be used to achieve desired outcome. Shadows help
    objects sit in space. Color impacts the perceived weight of the shadow, and
    may distort the image if it is too heavy.

    Blend mode: Multiply
    Opacity: 22% to 50% - depends on color of the item.
    Angle: 130 to 120, use global light
    Distance: 3 (256 thru 48x), Distance = 1 (32x, 24x)
    Spread: 0
    Size: 7 (256x thru 48x), Spread = 2 (32x, 24x)
}
    width := image.GetWidth;
    height := image.GetHeight;

    //Get bitmap to hold final composited image and shadow
    Result := TGPBitmap.Create(width, height, PixelFormat32bppARGB);

    //Use ColorMatrix methods to "draw" the alpha image.
    alphaMask := TImageEffects.GetAlphaMask(image);
    try
        //Blur the black and white shadow image
//      shadow := TImageEffects.BoxBlur(alphaMask, radius);
        shadow := TImageEffects.GaussianBlur(alphaMask, radius); //because Gaussian Blur is linearly-separable into two 1d kernels, it's actually faster than the box blur
    finally
        alphaMask.Free;
    end;

    //Draw
    graphics := TGPGraphics.Create(Result);
    try
        //Draw the "shadow", using the passed in opacity value.
        {
            Color transformations are of the form
        c =  (r, g, b, a)
        c' = (r, g, b, a)
        c' = c*M
            = (r, g, b, a, 1) * (0 0 0 0 0)  //r
                                      (0 0 0 0 0)  //g
                                      (0 0 0 0 0)  //b
                                      (1 1 1 1 0)  //a
                                      (0 0 0 0 1)  //1
        }

        imageAttributes := TGPImageAttributes.Create;
    {   cm := (
                ( 1, 0, 0, 0,   0),
                ( 0, 1, 0, 0,   0),
                ( 0, 0, 1, 0,   0),
                ( 0, 0, 0, 0.5, 0),
                ( 0, 0, 0, 0,   1)
            );}
        cm[0, 0] :=  1; cm[0, 1] :=  0; cm[0, 2] :=  0; cm[0, 3] := 0;       cm[0, 4] := 0;
        cm[1, 0] :=  0; cm[1, 1] :=  1; cm[1, 2] :=  0; cm[1, 3] := 0;       cm[1, 4] := 0;
        cm[2, 0] :=  0; cm[2, 1] :=  0; cm[2, 2] :=  1; cm[2, 3] := 0;       cm[2, 4] := 0;
        cm[3, 0] :=  0; cm[3, 1] :=  0; cm[3, 2] :=  0; cm[3, 3] := Opacity; cm[3, 4] := 0;
        cm[4, 0] :=  0; cm[4, 1] :=  0; cm[4, 2] :=  0; cm[4, 3] := 0;       cm[4, 4] := 1;


        imageAttributes.SetColorMatrix(
                cm,
                ColorMatrixFlagsDefault,
                ColorAdjustTypeBitmap);
        try
            graphics.DrawImage(shadow,
                        MakeRectF(OffsetX, OffsetY, width, height), //destination rectangle
                        0, 0, //source (x,y)
                        width, height, //source width, height
                        UnitPixel,
                        ImageAttributes);

            //Draw original image over-top of it's shadow
            graphics.DrawImage(image, 0, 0);
        finally
            imageAttributes.Free;
        end;
    finally
        graphics.Free;
    end;
end;

Which uses the the function to get the grayscale alpha mask:

class function TImageEffects.GetAlphaMask(image: TGPImage): TGPBitmap;
var
    imageAttributes: TGPImageAttributes;
    cm: TColorMatrix;
    graphics: TGPGraphics;
    Width, Height: UINT;
begin
    {
        Color transformations are of the form
    c =  (r, g, b, a)
    c' = (r, g, b, a)
    c' = c*M
        = (r, g, b, a, 1) * (0 0 0 0 0)
                            (0 0 0 0 0)
                            (0 0 0 0 0)
                            (1 1 1 1 0)
                            (0 0 0 0 1)
    }

    imageAttributes := TGPImageAttributes.Create;

{   cm := (
            ( 0,  0,  0, 0, 0),
            ( 0,  0,  0, 0, 0),
            ( 0,  0,  0, 0, 0),
            (-1, -1, -1, 1, 0),
            ( 1,  1,  1, 0, 1)
        );}
    cm[0, 0] :=  0; cm[0, 1] :=  0; cm[0, 2] :=  0; cm[0, 3] := 0; cm[0, 4] := 0;
    cm[1, 0] :=  0; cm[1, 1] :=  0; cm[1, 2] :=  0; cm[1, 3] := 0; cm[1, 4] := 0;
    cm[2, 0] :=  0; cm[2, 1] :=  0; cm[2, 2] :=  0; cm[2, 3] := 0; cm[2, 4] := 0;
    cm[3, 0] := -1; cm[3, 1] := -1; cm[3, 2] := -1; cm[3, 3] := 1; cm[3, 4] := 0;
    cm[4, 0] :=  1; cm[4, 1] :=  1; cm[4, 2] :=  1; cm[4, 3] := 0; cm[4, 4] := 1;


    imageAttributes.SetColorMatrix(
            cm,
            ColorMatrixFlagsDefault,
            ColorAdjustTypeBitmap);

    width := image.GetWidth;
    height := image.GetHeight;

    Result := TGPBitmap.Create(Integer(width), Integer(height));
    graphics := TGPGraphics.Create(Result);
   try
        graphics.DrawImage(
                image,
                MakeRect(0, 0, width, height), //destination rectangle
             0, 0, //source (x,y)
             width, height,
             UnitPixel,
                ImageAttributes);
   finally
        graphics.Free;
    end;
end;

The core is the gaussian blur:

class function TImageEffects.GaussianBlur(const bitmap: TGPBitmap;
  radius: Single): TGPBitmap;
var
    width, height: Integer;
    tempBitmap: TGPBitmap;
    bdSource: TBitmapData;
    bdTemp: TBitmapData;
    bdDest: TBitmapData;
    pSrc: PARGBArray;
    pTemp: PARGBArray;
    pDest: PARGBArray;
    stride: Integer;
    kernel: TKernel;
begin
//  kernel := MakeGaussianKernel2d(radius);
    kernel := MakeGaussianKernel1d(radius);
    try
//      Result := ConvolveBitmap(bitmap, kernel); brute 2d kernel

        width := bitmap.GetWidth;
        height := bitmap.GetHeight;

        // GDI+ still lies to us - the return format is BGR, NOT RGB.
        bitmap.LockBits(MakeRect(0, 0, width, height),
                ImageLockModeRead,
                PixelFormat32bppPARGB, bdSource);

        //intermediate bitmap
        tempBitmap := TGPBitmap.Create(width, height, PixelFormat32bppPARGB);
        tempBitmap.LockBits(MakeRect(0, 0, width, height),
                    ImageLockModeWrite,
                    PixelFormat32bppPARGB, bdTemp);

        //target bitmap
        Result := TGPBitmap.Create(width, height, PixelFormat32bppARGB);
        Result.LockBits(MakeRect(0, 0, width, height),
                    ImageLockModeWrite,
                    PixelFormat32bppPARGB, bdDest);

        pSrc := PARGBArray(bdSource.Scan0);
        pTemp := PARGBArray(bdTemp.Scan0);
        pDest := PARGBArray(bdDest.Scan0);
        stride := bdSource.Stride;

        ConvolveAndTranspose(kernel, pSrc^, pTemp^, width, height, stride, True, EdgeActionClampEdges);
        ConvolveAndTranspose(kernel, pTemp^, pDest^, height, width, stride, True, EdgeActionClampEdges);

        //Unlock source
       bitmap.UnlockBits(bdSource);
        tempBitmap.UnlockBits(bdTemp);
        Result.UnlockBits(bdDest);

        //get rid of temp
        tempBitmap.Free;
    finally
        kernel.Free;
    end;
end;

which requires a 1-D kernel:

class function TImageEffects.MakeGaussianKernel1d(radius: Single): TKernel;
var
    r: Integer;
    rows: Integer;
    matrix: TSingleDynArray;
    sigma: Single;
    sigma22: Single;
    sigmaPi2: Single;
    sqrtSigmaPi2: Single;
    radius2: Single;
    total: Single;
    index: Integer;
    row: Integer;
    distance: Single;
    i: Integer;
begin
    r := Ceil(radius);
    rows := r*2+1;

    SetLength(matrix, rows);
    sigma := radius/3.0;
    sigma22 := 2*sigma*sigma;
    sigmaPi2 := 2*pi*sigma;
    sqrtSigmaPi2 := Sqrt(sigmaPi2);
    radius2 := radius*radius;
    total := 0;

    Index := 0;
    for row := -r to r do
    begin
        distance := row*row;
        if (distance > radius2) then
            matrix[index] := 0
        else
        begin
            matrix[index] := Exp((-distance)/sigma22) / sqrtSigmaPi2;
            total := total + matrix[index];
            Inc(index);
        end;
    end;

    //Normalize the values
    for i := 0 to rows-1 do
        matrix[i] := matrix[i] / total;


    Result := TKernel.Create(rows, 1, matrix);
end;

And then the magic of the gaussian function is that it is separable into two 1D convolutions:

class procedure TImageEffects.convolveAndTranspose(kernel: TKernel;
  const inPixels: array of ARGB; var outPixels: array of ARGB; width,
  height, stride: Integer; alpha: Boolean; edgeAction: TEdgeAction);
var
    index: Integer;
    matrix: TSingleDynArray;
    rows: Integer; //number of rows in the kernel
    cols: Integer; //number of columns in the kernel
    rows2: Integer; //half row count
    cols2: Integer; //half column count

    x, y: Integer; //
    r, g, b, a: Single; //summed red, green, blue, alpha values
    row, col: Integer;
    ix, iy, ioffset: Integer;
    moffset: Integer;
    f: Single;
    rgb: ARGB;
    ir, ig, ib, ia: Integer;

   function ClampPixel(value: Single): Integer;
    begin
        Result := Trunc(value+0.5);
        if Result < 0 then
            Result := 0
        else if Result > 255 then
            Result := 255;
    end;
begin
    matrix := kernel.KernelData;
    cols := kernel.Width;
    cols2 := cols div 2;

    for y := 0 to height-1 do
    begin
        index := y;
        ioffset := y*width;
        for x := 0 to width-1 do
        begin
            r := 0;
            g := 0;
            b := 0;
            a := 0;

            moffset := cols2;
            for col := -cols2 to cols2 do
            begin
                f := matrix[moffset+col];

                if (f <> 0) then
                begin
                    ix := x+col;
                    if ( ix < 0 ) then
                    begin
                        if ( edgeAction = EdgeActionClampEdges ) then
                            ix := 0
                        else if ( edgeAction = EdgeActionWrapEdges ) then
                            ix := (x+width) mod width;
                    end
                    else if ( ix >= width) then
                    begin
                        if ( edgeAction = EdgeActionClampEdges ) then
                            ix := width-1
                        else if ( edgeAction = EdgeActionWrapEdges ) then
                            ix := (x+width) mod width;
                    end;
                    rgb := inPixels[ioffset+ix];
                    a := a + f * ((rgb shr 24) and $FF);
                    r := r + f * ((rgb shr 16) and $FF);
                    g := g + f * ((rgb shr  8) and $FF);
                    b := b + f * ((rgb       ) and $FF);
                end;
            end;
            if alpha then
                ia := ClampPixel(a)
         else
                ia := $FF;
            ir := ClampPixel(r);
            ig := ClampPixel(g);
            ib := ClampPixel(b);
            outPixels[index] := MakeARGB(ia, ir, ig, ib);

            Inc(index, height);
        end;
    end;
end;

with sample usage, on my 256x256 source images:

image := TImageEffects.GenerateDropShadow(localImage, 14, 2.12132, 2.12132, 1.0);

Profiling shows that 88.62% time is spent in the lines:

a := a + f * ((rgb shr 24) and $FF);
r := r + f * ((rgb shr 16) and $FF);
g := g + f * ((rgb shr  8) and $FF);
b := b + f * ((rgb       ) and $FF);

which is the per-pixel alpha blending.

Which makes me think that there's a better way to apply a soft drop-shadow that applying a blur effect, after all Windows and OSX apply a drop-shadow to windows in real-time.

Can't really show it all; it involves a 2D Gaussian blur which itself would be three screenfulls. i'm just concerned it's a whole lot of stuff for a whole lotta nuthin. It would be useful to know that the algorithm is sound. A Guassian blur is faster than a Box or Triangle blur, because the guassian blur is linearly separable, and convertible into two 1-D convolutions. — Ian Boyd, Sep 11 '11 at 23:25
@Igor Brejc: Sampling shows that the slowdown is in the per-pixel multiplication. i hope there's an algorithm completely different from the one i use (i.e. convert alpha to greyscale, blur, paint original image over top) — Ian Boyd, Sep 12 '11 at 00:14
have you considered masking out the (fairly large area) beneath the ticket, where you don't have to compute the drop shadow? — , Feb 09 '13 at 02:33
You should use pointers and unsafe code, that would speed it up considerably. — Gábor, May 18 '14 at 13:37
@Gábor It already is native code. Have you found an algorithm that's fater? — Ian Boyd, May 20 '14 at 17:08
My bad, C# is a bit different then, you seem to have about the same approach regarding this part of it. However, I'm struggling with a similar issue and, while I'm not completely satisfied with it yet, my current approach seems to be acceptably fast and much simpler, albeit not yet perfect (the problem is with the shadow radius). I'd be glad to offer it up and arrive at some good solution together but it's certainly not yet good enough for an answer here. Is there a way to send a PM here? I couldn't find one. — Gábor, May 21 '14 at 17:09
@Gábor You should post it as an answer; for everyone to benefit. — Ian Boyd, May 21 '14 at 18:30
Yep but I needed to arrive at a pleasing result first. :-) I finally ended up with a solution based on a blog on the net that's not *that* simple but it seems to work all right. I'll author the answer right away. — Gábor, May 22 '14 at 11:33
My original version was actually working but only with very small radii, 3 or 4 or similar. This is OK for larger values as well. — Gábor, May 22 '14 at 11:40
A Gaussian kernel is separable, which means you can break it up into a horizontal pass and a vertical pass, which reduces the amount of computation and _may_ be more cache-friendly. I'd also look into approximations that might be faster to compute and look for a way to blur only along the edges. — Adrian McCarthy, May 22 '14 at 17:46

Gábor · Accepted Answer · 2018-12-18T19:09:30.417

The algorithm came from this blog entry: http://blog.ivank.net/fastest-gaussian-blur.html . It's implementing the last and fastest version, of course. :-)

It's copied directly from my working code, so the external assumptions might reflect that. The function returns a larger bitmap to accommodate the increase in size. In your code, you need to handle this accordingly, of course. It assumes a 32-bit alpha picture but can be easily modified to handle 24-bit only (CHANNELS constant and the PixelFormat values).

public static class DropShadow {
  const int CHANNELS = 4;

  public static Bitmap CreateShadow(Bitmap bitmap, int radius, float opacity) {
    // Alpha mask with opacity
    var matrix = new ColorMatrix(new float[][] {
            new float[] {  0F,  0F,  0F, 0F,      0F }, 
            new float[] {  0F,  0F,  0F, 0F,      0F }, 
            new float[] {  0F,  0F,  0F, 0F,      0F }, 
            new float[] { -1F, -1F, -1F, opacity, 0F },
            new float[] {  1F,  1F,  1F, 0F,      1F }
        });

    var imageAttributes = new ImageAttributes();
    imageAttributes.SetColorMatrix(matrix, ColorMatrixFlag.Default, ColorAdjustType.Bitmap);
    var shadow = new Bitmap(bitmap.Width + 4 * radius, bitmap.Height + 4 * radius);
    using (var graphics = Graphics.FromImage(shadow))
      graphics.DrawImage(bitmap, new Rectangle(2 * radius, 2 * radius, bitmap.Width, bitmap.Height), 0, 0, bitmap.Width, bitmap.Height, GraphicsUnit.Pixel, imageAttributes);

    // Gaussian blur
    var clone = shadow.Clone() as Bitmap;
    var shadowData = shadow.LockBits(new Rectangle(0, 0, shadow.Width, shadow.Height), ImageLockMode.ReadWrite, PixelFormat.Format32bppArgb);
    var cloneData = clone.LockBits(new Rectangle(0, 0, clone.Width, clone.Height), ImageLockMode.ReadWrite, PixelFormat.Format32bppArgb);

    var boxes = DetermineBoxes(radius, 3);
    BoxBlur(shadowData, cloneData, shadow.Width, shadow.Height, (boxes[0] - 1) / 2);
    BoxBlur(shadowData, cloneData, shadow.Width, shadow.Height, (boxes[1] - 1) / 2);
    BoxBlur(shadowData, cloneData, shadow.Width, shadow.Height, (boxes[2] - 1) / 2);

    shadow.UnlockBits(shadowData);
    clone.UnlockBits(cloneData);
    return shadow;
  }

  private static unsafe void BoxBlur(BitmapData data1, BitmapData data2, int width, int height, int radius) {
    byte* p1 = (byte*)(void*)data1.Scan0;
    byte* p2 = (byte*)(void*)data2.Scan0;

    int radius2 = 2 * radius + 1;
    int[] sum = new int[CHANNELS];
    int[] FirstValue = new int[CHANNELS];
    int[] LastValue = new int[CHANNELS];

    // Horizontal
    int stride = data1.Stride;
    for (var row = 0; row < height; row++) {
      int start = row * stride;
      int left = start;
      int right = start + radius * CHANNELS;

      for (int channel = 0; channel < CHANNELS; channel++) {
        FirstValue[channel] = p1[start + channel];
        LastValue[channel] = p1[start + (width - 1) * CHANNELS + channel];
        sum[channel] = (radius + 1) * FirstValue[channel];
      }
      for (var column = 0; column < radius; column++)
        for (int channel = 0; channel < CHANNELS; channel++)
          sum[channel] += p1[start + column * CHANNELS + channel];
      for (var column = 0; column <= radius; column++, right += CHANNELS, start += CHANNELS)
        for (int channel = 0; channel < CHANNELS; channel++) {
          sum[channel] += p1[right + channel] - FirstValue[channel];
          p2[start + channel] = (byte)(sum[channel] / radius2);
        }
      for (var column = radius + 1; column < width - radius; column++, left += CHANNELS, right += CHANNELS, start += CHANNELS)
        for (int channel = 0; channel < CHANNELS; channel++) {
          sum[channel] += p1[right + channel] - p1[left + channel];
          p2[start + channel] = (byte)(sum[channel] / radius2);
        }
      for (var column = width - radius; column < width; column++, left += CHANNELS, start += CHANNELS)
        for (int channel = 0; channel < CHANNELS; channel++) {
          sum[channel] += LastValue[channel] - p1[left + channel];
          p2[start + channel] = (byte)(sum[channel] / radius2);
        }
    }

    // Vertical
    stride = data2.Stride;
    for (int column = 0; column < width; column++) {
      int start = column * CHANNELS;
      int top = start;
      int bottom = start + radius * stride;

      for (int channel = 0; channel < CHANNELS; channel++) {
        FirstValue[channel] = p2[start + channel];
        LastValue[channel] = p2[start + (height - 1) * stride + channel];
        sum[channel] = (radius + 1) * FirstValue[channel];
      }
      for (int row = 0; row < radius; row++)
        for (int channel = 0; channel < CHANNELS; channel++)
          sum[channel] += p2[start + row * stride + channel];
      for (int row = 0; row <= radius; row++, bottom += stride, start += stride)
        for (int channel = 0; channel < CHANNELS; channel++) {
          sum[channel] += p2[bottom + channel] - FirstValue[channel];
          p1[start + channel] = (byte)(sum[channel] / radius2);
        }
      for (int row = radius + 1; row < height - radius; row++, top += stride, bottom += stride, start += stride)
        for (int channel = 0; channel < CHANNELS; channel++) {
          sum[channel] += p2[bottom + channel] - p2[top + channel];
          p1[start + channel] = (byte)(sum[channel] / radius2);
        }
      for (int row = height - radius; row < height; row++, top += stride, start += stride)
        for (int channel = 0; channel < CHANNELS; channel++) {
          sum[channel] += LastValue[channel] - p2[top + channel];
          p1[start + channel] = (byte)(sum[channel] / radius2);
        }
    }
  }

  private static int[] DetermineBoxes(double Sigma, int BoxCount) {
    double IdealWidth = Math.Sqrt((12 * Sigma * Sigma / BoxCount) + 1);
    int Lower = (int)Math.Floor(IdealWidth);
    if (Lower % 2 == 0)
      Lower--;
    int Upper = Lower + 2;

    double MedianWidth = (12 * Sigma * Sigma - BoxCount * Lower * Lower - 4 * BoxCount * Lower - 3 * BoxCount) / (-4 * Lower - 4);
    int Median = (int)Math.Round(MedianWidth);

    int[] BoxSizes = new int[BoxCount];
    for (int i = 0; i < BoxCount; i++)
      BoxSizes[i] = (i < Median) ? Lower : Upper;
    return BoxSizes;
  }

}

I assume it must be straighforward to convert it to Delphi.

Addendum: according to the comments on that blog, if you have an integer radius and three boxes, you can actually forget DetermineBoxes() and use:

BoxBlur(shadowData, cloneData, shadow.Width, shadow.Height, radius - 1);
BoxBlur(shadowData, cloneData, shadow.Width, shadow.Height, radius - 1);
BoxBlur(shadowData, cloneData, shadow.Width, shadow.Height, radius);

Its execution time is negligible compared to the bitmap itself but still...

score 2 · Answer 2 · answered Sep 12 '11 at 05:24

The reason why I was asking for the code is to see whether you used the "fast bitmap" approach or GetPixel(), SetPixel() methods.

Since you already have this covered, I doubt you'll be able to do much more in terms of performance optimization. GDI+ just wasn't designed for such per-pixel manipulation scenarios. Realistically you should consider implementing a simpler shadow generator, which won't look as fancy, but will not be as processor-intensive.

It all very much depends on your usage scenario (which you haven't really described):

Are the images all similar (all tickets or have you used the ticket just as a sample)? If they are, then you could generate the shadow once and reuse that bitmap.
You could generate and cache shadowed versions of images (or just shadowed thumbnails) as a background process when the user is doing other stuff.

You could also try out the Gaussian blur in Paint.NET (which uses GDI+ for most stuff) and measure the speed of it there. I doubt you'll be able make it faster than Paint.NET, so it is a good benchmark.

When it gets down to it, the issue isn't GDI+. i have a ARGB image in memory, and i have to perform multiplication on every pixel. i notice that newer versions of GDI+ have a Blur image effect (msdn.microsoft.com/en-us/library/ms534422(v=vs.85).aspx). It might be faster than having to roll my own; with the downside of it not working on Windows XP. Assuming the GDI+ `ApplyEffect` is faster: *why* is it faster? What is the better algorithm to apply a blur? — Ian Boyd, Sep 12 '11 at 15:10

score 1 · Answer 3 · answered May 22 '14 at 13:09

If it's pure performance you're after you could also consider only convolving thin rectangle edge strips of the source image. This way you're not spending time convolving the center (hidden) portion of the image but only portions which have a chance of painting onscreen.

vfbb · Answer 4 · 2016-08-09T13:27:58.593

I tested some algorithms and the best was the gaussian blur that Gábor has been implemented. The delay of the algorithm is ~ 20 ms in my tests.

Here is the implementation of it algorithm in Delphi with some changes (it use the freeware Bilsen GDI+ lib):

function CreateBlurShadow(ABitmap: IGPBitmap; ARadius: Integer; AOpacity: Double; AColor: TColor = clNone): IGPBitmap;

  procedure BoxBlur(const AData1, AData2: TGPBitmapData; AWidth, AHeight, ARadius: Integer);
  const
    CHANNELS = 4;
  var
    LScan1, LScan2: PByte;
    LSum, LFirstValue, LLastValue: array [0..CHANNELS-1] of Integer;
    LRadius2, LStride, LStart, LChannel, LLeft, LRight, LBottom, LTop, LRow, LColumn: Integer;
  begin
    LScan1 := AData1.Scan0;
    LScan2 := AData2.Scan0;
    LRadius2 := (2 * ARadius) + 1;
    LStride := AData1.Stride;
    for LRow := 0 to AHeight-1 do
    begin
      LStart := LRow * LStride;
      LLeft := LStart;
      LRight := LStart + ARadius * CHANNELS;
      for LChannel := 0 to CHANNELS-1 do
      begin
        LFirstValue[LChannel] := LScan1[LStart + LChannel];
        LLastValue[LChannel] := LScan1[LStart + ((AWidth - 1) * CHANNELS) + LChannel];
        LSum[LChannel] := (ARadius + 1) * LFirstValue[LChannel];
      end;
      for LColumn := 0 to ARadius-1 do
        for LChannel := 0 to CHANNELS-1 do
          LSum[LChannel] := LSum[LChannel] + LScan1[LStart + (LColumn * CHANNELS) + LChannel];
      for LColumn := 0 to ARadius do
      begin
        for LChannel := 0 to CHANNELS-1 do
        begin
          LSum[LChannel] := LSum[LChannel] + LScan1[LRight + LChannel] - LFirstValue[LChannel];
          LScan2[LStart + LChannel] := Byte(LSum[LChannel] div LRadius2);
        end;
        Inc(LRight, CHANNELS);
        Inc(LStart, CHANNELS);
      end;
      for LColumn := ARadius + 1 to AWidth-ARadius-1 do
      begin
        for LChannel := 0 to CHANNELS-1 do
        begin
          LSum[LChannel] := LSum[LChannel] + LScan1[LRight + LChannel] - LScan1[LLeft + LChannel];
          LScan2[LStart + LChannel] := Byte(LSum[LChannel] div LRadius2);
        end;
        Inc(LLeft, CHANNELS);
        Inc(LRight, CHANNELS);
        Inc(LStart, CHANNELS);
      end;
      for LColumn := AWidth-ARadius to AWidth-1 do
      begin
        for LChannel := 0 to CHANNELS-1 do
        begin
          LSum[LChannel] := LSum[LChannel] + LLastValue[LChannel] - LScan1[LLeft + LChannel];
          LScan2[LStart + LChannel] := Byte(LSum[LChannel] div LRadius2);
        end;
        Inc(LLeft, CHANNELS);
        Inc(LStart, CHANNELS);
      end;
    end;
    LStride := AData2.Stride;
    for LColumn := 0 to AWidth-1 do
    begin
      LStart := LColumn * CHANNELS;
      LTop := LStart;
      LBottom := LStart + (ARadius * LStride);
      for LChannel := 0 to CHANNELS-1 do
      begin
        LFirstValue[LChannel] := LScan2[LStart + LChannel];
        LLastValue[LChannel] := LScan2[LStart + ((AHeight - 1) * LStride) + LChannel];
        LSum[LChannel] := (ARadius + 1) * LFirstValue[LChannel];
      end;
      for LRow := 0 to ARadius-1 do
        for LChannel := 0 to CHANNELS-1 do
          LSum[LChannel] := LSum[LChannel] + LScan2[LStart + (LRow * LStride) + LChannel];
      for LRow := 0 to ARadius do
      begin
        for LChannel := 0 to CHANNELS-1 do
        begin
          LSum[LChannel] := LSum[LChannel] + LScan2[LBottom + LChannel] - LFirstValue[LChannel];
          LScan1[LStart + LChannel] := Byte(LSum[LChannel] div LRadius2);
        end;
        Inc(LBottom, LStride);
        Inc(LStart, LStride);
      end;
      for LRow := ARadius + 1 to AHeight - ARadius - 1 do
      begin
        for LChannel := 0 to CHANNELS-1 do
        begin
          LSum[LChannel] := LSum[LChannel] + LScan2[LBottom + LChannel] - LScan2[LTop + LChannel];
          LScan1[LStart + LChannel] := Byte(LSum[LChannel] div LRadius2);
        end;
        Inc(LTop, LStride);
        Inc(LBottom, LStride);
        Inc(LStart, LStride);
      end;
      for LRow := AHeight - ARadius to AHeight-1 do
      begin
        for LChannel := 0 to CHANNELS-1 do
        begin
          LSum[LChannel] := LSum[LChannel] + LLastValue[LChannel] - LScan2[LTop + LChannel];
          LScan1[LStart + LChannel] := Byte(LSum[LChannel] div LRadius2);
        end;
        Inc(LTop, LStride);
        Inc(LStart, LStride);
      end;
    end;
  end;

const
  INITIAL_MATRIX: array [0..4, 0..4] of Single =
   ((0.5,   0,   0, 0, 0),
    (0,   0.5,   0, 0, 0),
    (0,     0, 0.5, 0, 0),
    (0,     0,   0, 1, 0),
    (0,     0,   0, 0, 1));
var
  LMatrix: TGPColorMatrix;
  LImageAttributes: IGPImageAttributes;
  LShadow, LClone: IGPBitmap;
  LGraphics: IGPGraphics;
  LShadowData, LCloneData: TGPBitmapData;
  LColor: TGPColor;
begin
  ARadius := Max(ARadius, 0);
  LShadow := TGPBitmap.Create(ABitmap.Width + (4 * Cardinal(ARadius)),
    ABitmap.Height + (4 * Cardinal(ARadius)), PixelFormat32bppARGB);
  LGraphics := TGPGraphics.FromImage(LShadow);
  LGraphics.DrawImage(ABitmap, TGPRect.Create(2 * ARadius, 2 * ARadius,
    ABitmap.Width, ABitmap.Height), 0, 0, ABitmap.Width, ABitmap.Height,
    TGPUnit.UnitPixel);
  LClone := LShadow.Clone;
  LShadowData := LShadow.LockBits(TGPRect.Create(0, 0, LShadow.Width, LShadow.Height),
    [ImageLockModeRead, ImageLockModeWrite], PixelFormat32bppARGB);
  LCloneData := LClone.LockBits(TGPRect.Create(0, 0, LClone.Width, LClone.Height),
    [ImageLockModeRead, ImageLockModeWrite], PixelFormat32bppARGB);
  try
    BoxBlur(LShadowData, LCloneData, LShadow.Width, LShadow.Height, ARadius - 1);
    BoxBlur(LShadowData, LCloneData, LShadow.Width, LShadow.Height, ARadius - 1);
    BoxBlur(LShadowData, LCloneData, LShadow.Width, LShadow.Height, ARadius);
  finally
    LShadow.UnlockBits(LShadowData);
    LClone.UnlockBits(LCloneData);
  end;
  if (AColor = clNone) and (AOpacity = 1.0) then
    Result := LShadow
  else
  begin
    LColor := TGPColor.CreateFromColorRef(ColorToRGB(AColor));
    Move(INITIAL_MATRIX[0, 0], LMatrix.M[0, 0], SizeOf(INITIAL_MATRIX));
    LMatrix.M[4, 0] := Min((Integer(LColor.R) - 127) / 127, 1.0);
    LMatrix.M[4, 1] := Min((Integer(LColor.G) - 127) / 127, 1.0);
    LMatrix.M[4, 2] := Min((Integer(LColor.B) - 127) / 127, 1.0);
    LMatrix.M[4, 3] := AOpacity-1;
    LImageAttributes := TGPImageAttributes.Create;
    LImageAttributes.SetColorMatrix(LMatrix, TGPColorMatrixFlags.ColorMatrixFlagsDefault,
      TGPColorAdjustType.ColorAdjustTypeBitmap);
    Result := TGPBitmap.Create(LShadow.Width, LShadow.Height, PixelFormat32bppARGB);
    LGraphics := TGPGraphics.FromImage(Result);
    LGraphics.DrawImage(LShadow, TGPRect.Create(0, 0, LShadow.Width, LShadow.Height),
      0, 0, Result.Width, Result.Height, TGPUnit.UnitPixel, LImageAttributes);
  end;
end;

score 0 · Answer 5 · edited May 23 '17 at 12:10

0

I knew that pixel-by-pixel manipulation was quite slower, but never did benchmarks; 70x seems a lot, more than I would expect. Maybe the fact that you're using a managed language contributes to this, because that's one situation in which a VM overhead is maximized. Have you tried to make that part of the program in native code? This link has a native implementation that you could use for a quick test:

http://www.codeproject.com/KB/GDI/Glow_and_Shadow_effects.aspx

Unfortunately their only difference is the use of a language that can generate native code, but they still use a double-level loop to visit the pixels. It would be better if you could use CUDA, for instance, if you can assume the machines in which the app will run have such hardware. But in such case you wouldn't be using GDI+ anymore. Anyway, maybe this other SO question is of help:

Using Graphics Card instead of GDI+ for Image Manipulation

edited May 23 '17 at 12:10

Community

1
1

answered Sep 12 '11 at 00:14

Fabio Ceconello

15,819
5
38
51

i'm not in managed code: it's GDI+ (i.e. native). Also the 70x slower is performing per-pixel multiplication - as opposed to doing nothing (no pixel effects). – Ian Boyd Sep 12 '11 at 00:25
Sorry, since your code was in Delphi, I assumed it was Delphi for .net – Fabio Ceconello Sep 12 '11 at 00:28
1

About the per-pixel manipulation, I believe that the problem is also in part caused by frequent access of non-contiguous memory locations, forcing the processor to flush its cache every time. Maybe rearranging that would help. – Fabio Ceconello Sep 12 '11 at 00:31
GDI+ is a native API no matter how you slice it. .NET has managed classes that wrap the unmanaged flat C GDI+ API. – Ian Boyd Sep 12 '11 at 15:11
I understand, but does that mean that when you got that benchmark for the pixel manipulation, that included call times for GDI+ APIs? My understanding was that the bottleneck was just in the pixel transformations, which don't involve GDI calls. – Fabio Ceconello Sep 12 '11 at 16:12
No, there are no GDI or GDI+ calls in the blur algorithm. i have a chunk of memory, laid out as 32-bit `ARGB`. Since this is a native application i am accessing memory directly (i.e. through a pointer). In the end *my* algorithm is irrelavent, i'm looking for the *good* one. It doesn't even have to involve getting the alpha channel and applying a blur... – Ian Boyd Sep 13 '11 at 11:26

Algorithm for fast Drop shadow in GDI+

5 Answers5

Linked