CUDA: unable to calculate grid size

Question

I'm writing a program to convert rgba image to greyscale. I've worked much on this and have correctly implemented the kernel. However, the grid size is possible wrong, even though its correct by my logic.

The kernel:

__global__
void rgba_to_greyscale(const uchar4* const rgbaImage,
                   unsigned char* const greyImage,
                   int numRows, int numCols)
{   
    int x = (blockIdx.x * blockDim.x) + threadIdx.x;
    int y = (blockIdx.y * blockDim.y) + threadIdx.y;

    if(x >= numCols || y >= numRows)
        return;

    uchar4 rgba = rgbaImage[x+y];
    float channelSum = 0.299f*rgba.x + 0.587f*rgba.y + 0.114f*rgba.z;

    greyImage[x+y] = channelSum;
}

and the kernel launch:

const dim3 blockSize(10, 10, 1);  //TODO
  size_t gridSizeX, gridSizeY;
  gridSizeX = numCols + (10 - (numCols % 10) );  //adding some number to make it multiple of 10
  gridSizeY = numRows + (10 - (numRows % 10) );  //adding some number to make it multiple of 10

  const dim3 gridSize( gridSizeX, gridSizeY, 1);  //TODO
  rgba_to_greyscale<<<gridSize, blockSize>>>(d_rgbaImage, d_greyImage, numRows, numCols);

I'm creating more number of threads then required and then applying a bound check in the kernel.

This is a widely used logic to create a little more number of threads and perform bound checks inside the kernel. Here is a generic formula to calculate the grid size. `gridSizeX = (numCols + blockSize.x - 1)/blockSize.x;` — sgarizvi, Jul 19 '13 at 07:52
possible duplicate of [this](http://stackoverflow.com/questions/14711668/colored-image-to-greyscale-image-using-cuda-parallel-processing) . I think its standard problem from udacity course.. — Sagar Masuti, Jul 19 '13 at 08:12
@SagarMasuti; I read that post too, but I couldn't figure out whats wrong in my code. It would be helpful if you could point the error in my (logically correct) code. — Harshil Sharma, Jul 19 '13 at 08:27
Even after writing this: gridSizeX = (numCols + blockSize.x - 1) / blockSize.x; gridSizeY = (numRows + blockSize.y - 1) / blockSize.y; const dim3 gridSize(gridSizeX, gridSizeY, 1); its only converting a very thin strip of pixels from the top. Can someone verify if the kernel itself it right? — Harshil Sharma, Jul 19 '13 at 08:27

score 4 · Accepted Answer · answered Jul 19 '13 at 08:27

4

You are accessing your image using x+y. But think about this, the maximum image size you can get this way is numRows+numCols. You cannot just add those two coordinates, since that would mean that e.g. (1,2) is the same image element as (3,0) which is plain rubbish. Instead for each y-coordinate you have to skip an entire row of the image, thus it should be rgbaImage[x+y*numCols] (and the same for the greyImage, of course). But note, that depending on the layout of your image data it might also be the other way around (x*numRows+y), but I'm assuming the usual image layout here (and in your kernel it doesn't matter anyway, since all pixels are treated equally).

answered Jul 19 '13 at 08:27

Christian Rau

45,360
10
108
185

Thanks for that. Maybe I got confused by an image being stored in a 1D array. Its was just like calculating the memory address of a cell in 2D array. I'm sure availability of debugger and native CUDA environment could have helped. BTW do you know about CUDA emalation environment in Windows or can point to some documentation for gpuocelot for Windows? – Harshil Sharma Jul 19 '13 at 08:43
@HarshilSharma No, sorry. – Christian Rau Jul 19 '13 at 08:44

CUDA: unable to calculate grid size

1 Answers1