Custom Kernel GpuMat with float

Question

I'm trying to write a custom kernel using GpuMat data to find the arc cosine of an image's pixels. I can upload, download, and change values when I upload data when the GPU has CV_8UC1 data but chars cannot be used to calculate arc cosines. However, when I try to convert my GPU to CV_32FC1 type (floats) I get an illegal memory access error during the download part. Here is my code:

//.cu code 
#include <cuda_runtime.h>
#include <stdlib.h>
#include <iostream>
#include <stdio.h>
__global__ void funcKernel(const float* srcptr, float* dstptr, size_t srcstep, const     size_t dststep, int cols, int rows){
    int rowInd = blockIdx.y*blockDim.y+threadIdx.y;
    int colInd = blockIdx.x*blockDim.x+threadIdx.x;
    if(rowInd >= rows || colInd >= cols)
            return;
    const float* rowsrcptr=srcptr+rowInd*srcstep;
    float* rowdstPtr=  dstptr+rowInd*dststep;
    float val = rowsrcptr[colInd];
    if((int) val % 90 == 0)
            rowdstPtr[colInd] = -1 ;
    else{
            float acos_val = acos(val);
            rowdstPtr[colInd] = acos_val;
    }
}

int divUp(int a, int b){
    return (a+b-1)/b;
}

extern "C"
{
void func(const float* srcptr, float* dstptr, size_t srcstep, const size_t dststep, int cols, int rows){
    dim3 blDim(32,8);
    dim3 grDim(divUp(cols, blDim.x), divUp(rows,blDim.y));
    std::cout << "calling kernel from func\n";
    funcKernel<<<grDim,blDim>>>(srcptr,dstptr,srcstep,dststep,cols,rows);
    std::cout << "done with kernel call\n";
     cudaDeviceSynchronize();
}

//.cpp code
void callKernel(const GpuMat &src, GpuMat &dst){
    float* p = (float*)src.data;
    float* p2 =(float*) dst.data;
    func(p,p2,src.step,dst.step,src.cols,src.rows);
}

int main(){
    Mat input = imread("cat.jpg",0);
    Mat float_input;
    input.convertTo(float_input,CV_32FC1);
    GpuMat d_frame,d_output;
    Size size = float_input.size();
    d_frame.upload(float_input);
    d_output.create(size,CV_32FC1);
    callKernel(d_frame,d_output);
    Mat output(d_output);
    return 0;
}

When I run the program my compiler tells me this:

OpenCV Error: Gpu API call (an illegal memory access was encountered) in copy, file /home/mobile/opencv-2.4.9/modules/dynamicuda/include/opencv2/dynamicuda/dynamicuda.hpp, line 882 terminate called after throwing an instance of 'cv::Exception' what(): /home/mobile/opencv-2.4.9/modules/dynamicuda/include/opencv2/dynamicuda/dynamicuda.hpp:882: error: (-217) an illegal memory access was encountered in function copy

score 16 · Answer 1 · edited Jun 23 '18 at 06:28

You can use cv::cuda::PtrStp<> or cv::cuda::PtrStpSz<> to write your own kernel (so you have not to use the step-Parameter for the GpuMat and it simplifies your code a little bit :D):

Kernel:

    __global__ void myKernel(const cv::cuda::PtrStepSzf input,
                             cv::cuda::PtrStepSzf output)
    {
        int x = blockIdx.x * blockDim.x + threadIdx.x;
        int y = blockIdx.y * blockDim.y + threadIdx.y;

        if (x <= input.cols - 1 && y <= input.rows - 1 && y >= 0 && x >= 0)
        {
           output(y, x) = input(y, x);
        }
    }

Notice:
cv::cuda::PtrStep<> : without size information
cv::cuda::PtrStepSz<>: with size information
cv::cuda::PtrStepSzb: for unsigned char Mats (CV_8U)
cv::cuda::PtrStepSzf: for float Mats (CV_32F)
cv::cuda::PtrStep<cv::Point2f>: example for other type

The Kernel call:

    void callKernel(cv::InputArray _input,
                    cv::OutputArray _output,
                    cv::cuda::Stream _stream)
    {
        const cv::cuda::GpuMat input = _input.getGpuMat();

        _output.create(input.size(), input.type()); 
        cv::cuda::GpuMat output = _output.getGpuMat();

        dim3 cthreads(16, 16);
        dim3 cblocks(
            static_cast<int>(std::ceil(input1.size().width /
                static_cast<double>(cthreads.x))),
            static_cast<int>(std::ceil(input1.size().height / 
                static_cast<double>(cthreads.y))));

        cudaStream_t stream = cv::cuda::StreamAccessor::getStream(_stream);
        myKernel<<<cblocks, cthreads, 0, stream>>>(input, output);

        cudaSafeCall(cudaGetLastError());
    }

You can call this function using cv::cuda::GpuMat:

   callKernel(d_frame, d_output, cv::cuda::Stream());

@50ty How to include opencv to .cu file and use `__global__ void myKernel(const cv::cuda::PtrStepSzf input, cv::cuda::PtrStepSzf output)` ? — AmiR Hossein, Jul 05 '16 at 06:53
see http://docs.opencv.org/master/d0/d60/classcv_1_1cuda_1_1GpuMat.html#gsc.tab=0 for the header, you have to include core/include/opencv2/core/cuda.hpp for GpuMat and PtrStep — 50ty, Jul 07 '16 at 22:11

score 7 · Accepted Answer · edited May 23 '17 at 11:54

7

You are treating image step as if it is a float offset. It is a byte offset from one row to the next.

Try something like this instead:

const float* rowsrcptr= (const float *)(((char *)srcptr)+rowInd*srcstep);
float* rowdstPtr=  (float *) (((char *)dstptr)+rowInd*dststep);

from the documentation:

step – Number of bytes each matrix row occupies.

It's also a good idea to add proper cuda error checking to your code (e.g. to func). And you can run your code with cuda-memcheck to see the actual kernel failure generating the invalid reads/writes.

edited May 23 '17 at 11:54

Community

1
1

answered Jul 07 '14 at 15:31

Robert Crovella

143,785
11
213
257

Okay this might be a stupid question but when you say error checking do you mean to wrap the cudaDeviceSynchronize() call? – jon Jul 08 '14 at 09:12
Yes, in this case, in `func`, it would be just error checking on the kernel itself. (All the other "CUDA stuff" is being handled by OpenCV) The words "proper cuda error checking" in my answer are a link you can click on. Click on that link, it will take you to a question/answer that explains it how to do it. – Robert Crovella Jul 08 '14 at 13:59
is it OK to just combine C and C++ like that? Is that the usual way of doing things? hence iostream and the stdio.h library? – LandonZeKepitelOfGreytBritn May 24 '17 at 21:15
@trilolil not sure what you are talking about. My answer about the row offset? Or the error checking stuff? If you are asking about the error checking link, why not ask your question there (on that question, not here)? As far as I know, there is nothing illegal about including both `stdlib.h` and `iostream` in the same module. – Robert Crovella May 24 '17 at 23:25
indeed it is about including iostream and stdlib in the same module. To me, it seems very messy to combine C and C++ and was wondering whether there was another way to do things instead. – LandonZeKepitelOfGreytBritn May 24 '17 at 23:27
You can implement proper CUDA error checking without using both stdio.h and iostream. I would think with the presentation of a worked example like that, a competent programmer could convert to only using one or the other. The focus of that answer is not on what system pipe you use to connect to the console, the focus of that answer is to demonstrate the CUDA side of it. – Robert Crovella May 24 '17 at 23:29

Custom Kernel GpuMat with float

2 Answers2

Linked

Related