As far as I'm aware cv::cuda::PtrStep
is used to passing GpuMat
data directly to the custom kernel. I found examples of one channel access here however my case is 2 channel mat (CV_32FC2
). In this case I'm trying to achieve complex absolute squared value where complex values are encoded like: real part is 1st plane, imaginary part is 2nd plane of given Mat
.
I tried:
__global__ void testKernel(const cv::cuda::PtrStepSz<cv::Vec2f> input, cv::cuda::PtrStepf output)
{
int x = blockIdx.x * blockDim.x + threadIdx.x;
int y = blockIdx.y * blockDim.y + threadIdx.y;
if (x <= input.cols - 1 && y <= input.rows - 1 && y >= 0 && x >= 0)
{
float val_re = input(x, y)[0];
float val_im = input(x, y) [1];
output(x, y) = val_re * val_re + val_im * val_im;
}
}
but this results in the following error:
calling a __host__ function("cv::Vec<float, (int)2> ::operator []") from a __global__ function("gpuholo::testKernel") is not allowed
I get it. []
is __host__
restricted function since its cv::Vec2f
not cv::cuda::Vec2f
(which apparently does not exist). But still I would really like to access the data.
Is there other mechanism to access 2-channel data on device side similar to Vec2f
?
I thought of workaround in form of splitting input
into two CV_32FC1
Mat
s so the kernel would look like:
__global__ void testKernel(const cv::cuda::PtrStepSzf re, const cv::cuda::PtrStepSzf im, cv::cuda::PtrStepf output)
but I'm wondering whether there's a "cleaner" solution, Vec2f
-like one.