OpenCL Kernel for Canny

Question

I'm trying to achieve Canny Edge Detection with OpenCL kernel in very simplified and easy way.

I'm using original SobelFilter kernel to do steps like non-maximum suppression and thresholding.

But I'm lost about reaching the pixels and doing the math calculations on them with:

__kernel void sobel_filter(__global uchar4* inputImage, __global uchar4* outputImage)

Could you give me ideas or show me simple examples to achieve this? It will be highly appreciated. Regards.

https://stackoverflow.com/questions/17815687/image-processing-implementing-sobel-filter — huseyin tugrul buyukisik, Jun 05 '17 at 14:36
How did you pass data to GPU? As an image buffer or as a simple buffer? — huseyin tugrul buyukisik, Jun 05 '17 at 15:27
hypot means `sqrt(x² + y²)` and dividing by 2 means its taking only half for a reason such as using char(so its half of 255 at max) for buffers instead of uchar? Taking sqrt of a float4 also means sqrt of each element (4 of them) separately just as in here for red,green,blue. convert_ is to convert efficiently(probably) — huseyin tugrul buyukisik, Jun 05 '17 at 17:06

huseyin tugrul buyukisik · Accepted Answer · 2017-06-05T17:10:39.903

Sobel filter is inhrently separable into X and Y dimensions in kernel execution. So one can scan only on X or only on Y dimensions or both of them in same kernel loop to achieve edge feature detection.

Using user azer89's soution here: Image Processing - Implementing Sobel Filter

I prepared this kernel:

__kernel void postProcess(__global uchar * input, __global uchar * output)
{
    int resultImgSize=1024;
    int pixelX=get_global_id(0)%resultImgSize; // 1-D id list to 2D workitems(each process a single pixel)
    int pixelY=get_global_id(0)/resultImgSize;
    int imgW=resultImgSize;
    int imgH=resultImgSize;


    float kernelx[3][3] = {{-1, 0, 1}, 
                           {-2, 0, 2}, 
                           {-1, 0, 1}};
    float kernely[3][3] = {{-1, -2, -1}, 
                           {0,  0,  0}, 
                           {1,  2,  1}};

    // also colors are separable
    int magXr=0,magYr=0; // red
    int magXg=0,magYg=0;
    int magXb=0,magYb=0;

    // Sobel filter
    // this conditional leaves 10-pixel-wide edges out of processing
    if( (pixelX<imgW-10) && (pixelY<imgH-10) && (pixelX>10) && (pixelY>10) )
    { 
        for(int a = 0; a < 3; a++)
        {
            for(int b = 0; b < 3; b++)
            {            
                int xn = pixelX + a - 1;
                int yn = pixelY + b - 1;

                int index = xn + yn * resultImgSize;
                magXr += input[index*4] * kernelx[a][b];
                magXg += input[index*4+1] * kernelx[a][b];
                magXb += input[index*4+2] * kernelx[a][b];
                magYr += input[index*4] * kernely[a][b];
                magYg += input[index*4+1] * kernely[a][b];
                magYb += input[index*4+2] * kernely[a][b];
            }
         }
    }

    // magnitude of x+y vector
    output[(pixelX+pixelY*resultImgSize)*4]  =sqrt((float)(magXr*magXr + magYr*magYr)) ;
    output[(pixelX+pixelY*resultImgSize)*4+1]=sqrt((float)(magXg*magXg + magYg*magYg)) ;
    output[(pixelX+pixelY*resultImgSize)*4+2]=sqrt((float)(magXb*magXb + magYb*magYb)) ;
    output[(pixelX+pixelY*resultImgSize)*4+3]=255;

}

Indices were multipled with 4 here because they were interpreted as uchar array as kernel parameters. uchar is a single byte in OpenCL(at least for my system).

Here is a video of this:

Sobel Filter Example

if it works for you too, you should accept azer89's solution. But this is not very optimized and may take 1-2 milliseconds for a low end GPU and even more with only a CPU for 1024x1024 image. The image data is sent to an OpenCL buffer(not image buffer) using a byte array(of C# language) and kernel launch options are:

Global range = 1024*1024 (1 thread per pixel processing)
Local range = 256 (this is not important)
Buffer copy size 1024*1024*4 (bytes for rgba format)

also kernelx and kernely 2D arrays here were float so making them char could make it faster. Also you may check results(clamp, divide, ...) if result looks a lot more colorful than its expected. Host side representation/interpretation is also important to handle underflow and overlow of colors.

Thank you Mr. Huseyin for your answer and guidance. It *worked like a* **charm!** — , Jun 06 '17 at 19:35
Put more effort on C++ side. Then you wouldn't have any difficulties if you ever move to sycl and similar syntactic up to date opencl sugar. — huseyin tugrul buyukisik, Jun 06 '17 at 19:43

score 1 · Answer 2 · answered Jun 06 '17 at 02:07

1

ARM compute library has canny implementation Canny CL kernel

answered Jun 06 '17 at 02:07

kanna

1,412
1
15
33

*Thank you for guidance!* – Jun 06 '17 at 18:43

OpenCL Kernel for Canny

2 Answers2