Fastest Gaussian blur implementation

Question

How do you implement the fastest possible Gaussian blur algorithm?

I am going to implement it in Java, so GPU solutions are ruled out. My application, planetGenesis, is cross platform, so I don't want JNI.

Worth taking a look at (or using directly!) the GaussianFilter at JH Labs - http://www.jhlabs.com/ip/filters/index.html - I've used it and it is pretty fast. — mikera, Oct 04 '12 at 10:44
Have a look here: https://github.com/RoyiAvital/FastGuassianBlur — Royi, May 12 '15 at 14:33

score 33 · Answer 1 · edited Mar 24 '14 at 12:34

You should use the fact that a Gaussian kernel is separable, i. e. you can express a 2D convolution as a combination of two 1D convolutions.

If the filter is large, it may also make sense to use the fact that convolution in the spatial domain is equivalent to multiplication in the frequency (Fourier) domain. This means that you can take the Fourier transform of the image and the filter, multiply the (complex) results, and then take the inverse Fourier transform. The complexity of the FFT (Fast Fourier Transform) is O(n log n), while the complexity of a convolution is O(n^2). Also, if you need to blur many images with the same filter, you would only need to take the FFT of the filter once.

If you decide to go with using an FFT, the FFTW library is a good choice.

Also note that the set of Gaussian functions is closed under Fourier transforms -- taking the Fourier transform of one Gaussian will just give you a different Gaussian. — Dietrich Epp, Sep 15 '12 at 12:58

score 30 · Answer 2 · answered Sep 19 '08 at 01:58

30

Math jocks are likely to know this, but for anyone else..

Due to a nice mathematical propertiy of the Gaussian, you can blur a 2D image quickly by first running a 1D Gaussian blur on each row of the image, then run a 1D blur on each column.

answered Sep 19 '08 at 01:58

DarenW

16,549
7
63
102

score 17 · Answer 3 · edited Aug 13 '11 at 09:59

I found Quasimondo : Incubator : Processing : Fast Gaussian Blur. This method contains a lot of approximations like using integers and look up tables instead of floats and floating point divisions. I don't know how much speedup that is in modern Java code.
Fast Shadows on Rectangles has an approximating algorithm using B-splines.
Fast Gaussian Blur Algorithm in C# claims to have some cool optimizations.
Also, Fast Gaussian Blur (PDF) by David Everly has a fast method for Gaussian blur processing.

I would try out the various methods, benchmark them and post the results here.

For my purposes, I have copied and implemented the basic (process X-Y axis independently) method and David Everly's Fast Gaussian Blur method from the Internet. They differ in parameters, so I couldn't compare them directly. However the latter goes through much fewer number of iterations for a large blur radius. Also, the latter is an approximate algorithm.

Steve Hanov · Answer 4 · 2008-09-19T02:22:19.280

8

You probably want the box blur, which is much faster. See this link for a great tutorial and some copy & paste C code.

edited Sep 19 '08 at 02:22

answered Sep 19 '08 at 00:38

Steve Hanov

11,316
16
62
69

How do you link between the STD of the Gaussian Kernel to the length of the Box Blur? – Royi Apr 30 '14 at 00:00

score 7 · Answer 5 · edited Aug 13 '11 at 10:02

7

For larger blur radiuses, try applying a box blur three times. This will approximate a Gaussian blur very well, and be much faster than a true Gaussian blur.

edited Aug 13 '11 at 10:02

Peter Mortensen

30,738
21
105
131

answered Feb 11 '09 at 22:22

Tom Sirgedas

3,208
20
17

2

Basically. If you wanted a "blur diameter" of 20, apply a box blur with diameter 7, 7, then 6. That will give a blur similar to a single box blur with diameter 20, but MUCH nicer looking. – Tom Sirgedas Mar 13 '09 at 22:25
1

AFAIK Photoshop does this instead of a true Gaussian blur. – Camilo Martin Aug 05 '10 at 00:16
By the way, BoxBlurRadius = 0.39 * GaussBlurRadius. – Ivan Kuckir Aug 21 '13 at 09:49
@IvanKuckir: I don't understand where you take this number 0.39 from. – Jaan Sep 21 '13 at 21:06
@Jaan Neither do I, I came up with it by trying, it works! It must be something like integral over 2D gaussian (volume) "converted" to box while keeping the same expected value. Check my answer below! – Ivan Kuckir Sep 21 '13 at 21:29
When approximating a given Gaussian distribution, a reasonable idea would be to match its standard deviation. Then the correct formula is BoxBlurRadius = sqrt(0.25+3*σ^2/IterationCount)-0.5, where the box width is 2*BoxBlurRadius+1 and σ is the standard deviation of the 1D Gaussian blur. For large values of σ, this simplifies to BoxBlurRadius = sqrt(3/IterationCount)*σ. (I used these facts: convolution is associative; convolution of two densities is the density of the sum of independent random variables; the variance of such sum is the sum of variances; (-r)^2 + ... + r^2 = r(r+1)(2r+1)/3.) – Jaan Sep 21 '13 at 21:38
@CamiloMartin, Were you able to figure how Photoshop uses Box Blur to approximate Gaussian Blur? For instance I chose Gaussian Blur with Std = 0.5. – Royi Apr 30 '14 at 00:12
@IvanKuckir, Were you able to reproduce Photoshop's Gaussian Blur results? – Royi Apr 30 '14 at 00:39
@IvanKuckir, Would you share? – Royi Apr 30 '14 at 17:27
@Drazick Please see my answer http://stackoverflow.com/a/18375487/978280, my implementation is actually more precise than the one from Adobe Photoshop. – Ivan Kuckir May 09 '14 at 19:33

Ali Akdurak · Answer 6 · 2016-07-09T21:53:50.287

I have converted Ivan Kuckir's implementation of a fast Gaussian blur which uses three passes with linear box blurs to java. The resulting process is O(n) as he has stated at his own blog. If you would like to learn more about why does 3 time box blur approximates to Gaussian blur(3%) my friend you may check out box blur and Gaussian blur.

Here is the java implementation.

@Override
public BufferedImage ProcessImage(BufferedImage image) {
    int width = image.getWidth();
    int height = image.getHeight();

    int[] pixels = image.getRGB(0, 0, width, height, null, 0, width);
    int[] changedPixels = new int[pixels.length];

    FastGaussianBlur(pixels, changedPixels, width, height, 12);

    BufferedImage newImage = new BufferedImage(width, height, image.getType());
    newImage.setRGB(0, 0, width, height, changedPixels, 0, width);

    return newImage;
}

private void FastGaussianBlur(int[] source, int[] output, int width, int height, int radius) {
    ArrayList<Integer> gaussianBoxes = CreateGausianBoxes(radius, 3);
    BoxBlur(source, output, width, height, (gaussianBoxes.get(0) - 1) / 2);
    BoxBlur(output, source, width, height, (gaussianBoxes.get(1) - 1) / 2);
    BoxBlur(source, output, width, height, (gaussianBoxes.get(2) - 1) / 2);
}

private ArrayList<Integer> CreateGausianBoxes(double sigma, int n) {
    double idealFilterWidth = Math.sqrt((12 * sigma * sigma / n) + 1);

    int filterWidth = (int) Math.floor(idealFilterWidth);

    if (filterWidth % 2 == 0) {
        filterWidth--;
    }

    int filterWidthU = filterWidth + 2;

    double mIdeal = (12 * sigma * sigma - n * filterWidth * filterWidth - 4 * n * filterWidth - 3 * n) / (-4 * filterWidth - 4);
    double m = Math.round(mIdeal);

    ArrayList<Integer> result = new ArrayList<>();

    for (int i = 0; i < n; i++) {
        result.add(i < m ? filterWidth : filterWidthU);
    }

    return result;
}

private void BoxBlur(int[] source, int[] output, int width, int height, int radius) {
    System.arraycopy(source, 0, output, 0, source.length);
    BoxBlurHorizantal(output, source, width, height, radius);
    BoxBlurVertical(source, output, width, height, radius);
}

private void BoxBlurHorizontal(int[] sourcePixels, int[] outputPixels, int width, int height, int radius) {
    int resultingColorPixel;
    float iarr = 1f / (radius + radius);
    for (int i = 0; i < height; i++) {
        int outputIndex = i * width;
        int li = outputIndex;
        int sourceIndex = outputIndex + radius;

        int fv = Byte.toUnsignedInt((byte) sourcePixels[outputIndex]);
        int lv = Byte.toUnsignedInt((byte) sourcePixels[outputIndex + width - 1]);
        float val = (radius) * fv;

        for (int j = 0; j < radius; j++) {
            val += Byte.toUnsignedInt((byte) (sourcePixels[outputIndex + j]));
        }

        for (int j = 0; j < radius; j++) {
            val += Byte.toUnsignedInt((byte) sourcePixels[sourceIndex++]) - fv;
            resultingColorPixel = Byte.toUnsignedInt(((Integer) Math.round(val * iarr)).byteValue());
            outputPixels[outputIndex++] = (0xFF << 24) | (resultingColorPixel << 16) | (resultingColorPixel << 8) | (resultingColorPixel);
        }

        for (int j = (radius + 1); j < (width - radius); j++) {
            val += Byte.toUnsignedInt((byte) sourcePixels[sourceIndex++]) - Byte.toUnsignedInt((byte) sourcePixels[li++]);
            resultingColorPixel = Byte.toUnsignedInt(((Integer) Math.round(val * iarr)).byteValue());
            outputPixels[outputIndex++] = (0xFF << 24) | (resultingColorPixel << 16) | (resultingColorPixel << 8) | (resultingColorPixel);
        }

        for (int j = (width - radius); j < width; j++) {
            val += lv - Byte.toUnsignedInt((byte) sourcePixels[li++]);
            resultingColorPixel = Byte.toUnsignedInt(((Integer) Math.round(val * iarr)).byteValue());
            outputPixels[outputIndex++] = (0xFF << 24) | (resultingColorPixel << 16) | (resultingColorPixel << 8) | (resultingColorPixel);
        }
    }
}

private void BoxBlurVertical(int[] sourcePixels, int[] outputPixels, int width, int height, int radius) {
    int resultingColorPixel;
    float iarr = 1f / (radius + radius + 1);
    for (int i = 0; i < width; i++) {
        int outputIndex = i;
        int li = outputIndex;
        int sourceIndex = outputIndex + radius * width;

        int fv = Byte.toUnsignedInt((byte) sourcePixels[outputIndex]);
        int lv = Byte.toUnsignedInt((byte) sourcePixels[outputIndex + width * (height - 1)]);
        float val = (radius + 1) * fv;

        for (int j = 0; j < radius; j++) {
            val += Byte.toUnsignedInt((byte) sourcePixels[outputIndex + j * width]);
        }
        for (int j = 0; j <= radius; j++) {
            val += Byte.toUnsignedInt((byte) sourcePixels[sourceIndex]) - fv;
            resultingColorPixel = Byte.toUnsignedInt(((Integer) Math.round(val * iarr)).byteValue());
            outputPixels[outputIndex] = (0xFF << 24) | (resultingColorPixel << 16) | (resultingColorPixel << 8) | (resultingColorPixel);
            sourceIndex += width;
            outputIndex += width;
        }
        for (int j = radius + 1; j < (height - radius); j++) {
            val += Byte.toUnsignedInt((byte) sourcePixels[sourceIndex]) - Byte.toUnsignedInt((byte) sourcePixels[li]);
            resultingColorPixel = Byte.toUnsignedInt(((Integer) Math.round(val * iarr)).byteValue());
            outputPixels[outputIndex] = (0xFF << 24) | (resultingColorPixel << 16) | (resultingColorPixel << 8) | (resultingColorPixel);
            li += width;
            sourceIndex += width;
            outputIndex += width;
        }
        for (int j = (height - radius); j < height; j++) {
            val += lv - Byte.toUnsignedInt((byte) sourcePixels[li]);
            resultingColorPixel = Byte.toUnsignedInt(((Integer) Math.round(val * iarr)).byteValue());
            outputPixels[outputIndex] = (0xFF << 24) | (resultingColorPixel << 16) | (resultingColorPixel << 8) | (resultingColorPixel);
            li += width;
            outputIndex += width;
        }
    }
}

it works great but the result image is in black and white, how can I make it colored plz? — Tamer Saleh, Nov 24 '21 at 17:33

score 2 · Answer 7 · edited Aug 13 '11 at 10:06

In 1D:

Blurring using almost any kernel repeatedly will tend to a Gaussian kernel. This is what's so cool about the Gaussian distribution, and is why statisticians like it. So choose something that's easy to blur with and apply it several times.

For example, it's easy to blur with a box shaped kernel. First calculate a cumulative sum:

y(i) = y(i-1) + x(i)

then:

blurred(i) = y(i+radius) - y(i-radius)

Repeat several times.

Or you might go back and forth a few times with some variety of an IIR filter, these are similarly fast.

In 2D or higher:

Blur in each dimension one after the other, as DarenW said.

score 2 · Answer 8 · edited Aug 13 '11 at 10:09

I struggled with this problem for my research and tried and interesting method for a fast Gaussian blur. First, as mentioned, it is best to separate the blur into two 1D blurs, but depending on your hardware for the actual calculation of the pixel values, you can actually pre-compute all possible values and store them in a look-up table.

In other words, pre-calculate every combination of Gaussian coefficient * input pixel value. Of course you will need to discreetize your coefficients, but I just wanted to add this solution. If you have an IEEE subscription, you can read more in Fast image blurring using Lookup Table for real time feature extraction.

Ultimately, I ended up using CUDA though :)

score 2 · Answer 9 · answered Sep 19 '08 at 00:44

2

I would consider using CUDA or some other GPU programming toolkit for this, especially if you want to use a larger kernel. Failing that, there's always hand tweaking your loops in assembly.

answered Sep 19 '08 at 00:44

Dana the Sane

14,762
8
58
80

score 2 · Answer 10 · edited Aug 13 '11 at 09:46

2

Step 1: SIMD 1-dimensional Gaussian blur
Step 2: transpose
Step 3: Repeat step 1

It is best done on small blocks, as a full-image transpose is slow, while a small-block transpose can be done extremely fast using a chain of PUNPCKs (PUNPCKHBW, PUNPCKHDQ, PUNPCKHWD, PUNPCKLBW, PUNPCKLDQ, PUNPCKLWD).

edited Aug 13 '11 at 09:46

Peter Mortensen

30,738
21
105
131

answered Sep 19 '08 at 02:03

Dark Shikari

7,941
4
26
38

score 1 · Answer 11 · answered Feb 19 '18 at 14:38

I have seen several answers in different places and am collecting them here so that I can try to wrap my mind around them and remember them for later:

Regardless of which approach you use, filter horizontal and vertical dimensions separately with 1D filters rather than using a single square filter.

The standard "slow" approach: convolution filter
Hierarchical pyramid of images with reduced resolution as in SIFT
Repeated box blurs motivated by the Central Limit Theorem. The Box Blur is central to Viola and Jones's face detection where they call it an integral image if I remember correctly. I think Haar-like features use something similar, too.
Stack Blur: a queue-based alternative somewhere between the convolutional and box blur approaches
IIR filters
- Derich filter (Wikipedia) 2nd order IIR filter
- van Vliet filter I know nothing about this one
- Bessel filters although there is some debate about these

After reviewing all of these, I'm reminded that simple, poor approximations often work well in practice. In a different field, Alex Krizhevsky found ReLU's to be faster than the classic sigmoid function in his ground-breaking AlexNet, even though they appear at first glance to be a terrible approximation to the Sigmoid.

score 0 · Answer 12 · edited May 23 '17 at 11:53

0

Try using Box Blur the way I did here: Approximating Gaussian Blur Using Extended Box Blur

This is the best approximation.

Using Integral Images you can make it even faster.
If you do, Please share your solution.

edited May 23 '17 at 11:53

Community

1
1

answered May 24 '14 at 07:53

Royi

4,640
6
46
64

score 0 · Answer 13 · answered Jun 23 '16 at 10:50

Answering this old question with new libraries that have been implemented now(as of 2016), because there are many new advancements to the GPU technology with Java.

As suggested in few other answers, CUDA is an alternative. But java has CUDA support now.

IBM CUDA4J library: provides a Java API for managing and accessing GPU devices, libraries, kernels, and memory. Using these new APIs it is possible to write Java programs that manage GPU device characteristics and offload work to the GPU with the convenience of the Java memory model, exceptions, and automatic resource management.

Jcuda: Java bindings for NVIDIA CUDA and related libraries. With JCuda it is possible to interact with the CUDA runtime and driver API from Java programs.

Aparapi: allows Java developers to take advantage of the compute power of GPU and APU devices by executing data parallel code fragments on the GPU rather than being confined to the local CPU.

Some Java OpenCL binding libraries

https://github.com/ochafik/JavaCL : Java bindings for OpenCL: An object-oriented OpenCL library, based on auto-generated low-level bindings

http://jogamp.org/jocl/www/ : Java bindings for OpenCL: An object-oriented OpenCL library, based on auto-generated low-level bindings

http://www.lwjgl.org/ : Java bindings for OpenCL: Auto-generated low-level bindings and object-oriented convenience classes

http://jocl.org/ : Java bindings for OpenCL: Low-level bindings that are a 1:1 mapping of the original OpenCL API

All these above libraries will help in implementing Gaussian Blur faster than any implementation in Java on CPU.

score 0 · Answer 14 · answered Jul 02 '18 at 16:38

0

Dave Hale from CWP has a minejtk package, which includes recursive Gaussian filter (Deriche method and Van Vliet method). The java subroutine can be found at https://github.com/dhale/jtk/blob/0350c23f91256181d415ea7369dbd62855ac4460/core/src/main/java/edu/mines/jtk/dsp/RecursiveGaussianFilter.java

Deriche's method seems to be a very good one for Gaussian blur (and also for Gaussian's derivatives).

answered Jul 02 '18 at 16:38

Kai

81
1
6

1

The reason I recommend Deriche's method for Gaussian blur is because it is very accurate. See the following paper for a survey and comparison: http://dev.ipol.im/~getreuer/code/doc/gaussian_20131215_doc/group__deriche__gaussian.html – Kai Jul 02 '18 at 16:39

score -1 · Answer 15 · answered Mar 10 '14 at 10:49

There are several fast methods for gauss blur of 2d data. What you should know about.

This is separable filter , so only require two 1d convolution.
For big kernels you can process reduced copy of image and than upscale back.
Good approximation can be done by multiple box filters (also separable), (can be tuned number of iterations and kernel sizes)
Exist O(n) complexity algorithm (for any kernel size) for precise gauss approximation by IIR filter.

Your choice is depend from required speed, precision, and implementation complexity.

Fastest Gaussian blur implementation

15 Answers15

Linked

Related