0

I have an assignment that requires me to generate Redheffer matrix on GPU using Cuda.

A Redheffer matrix1 is a matrix where each entry a[i][j] is defined by  
a[i][j] =  
1 if j = 1,   
1 if j is divisible by i  
0 otherwise.

Here is my code

    #define SIZE = 20000
    #define BLOCK_WIDTH 16

   /* Launch the CUDA kernel */
    int numBlocks = ceil(SIZE / BLOCK_WIDTH);
    dim3 dimGrid(BLOCK_WIDTH,BLOCK_WIDTH,1);
    dim3 dimBlock(numBlocks,numBlocks,1);
    redhefferMatrix<<<dimGrid, dimBlock>>>(d_M, SIZE);

I have code to verify if the output is right, it return error message when matrix element value computed is not correct. When I run my program, I get this error.

GPU number 0 is assigned to this job
    Row 0 column 5000 is incorrect. Should be:1 Is actually: 0

My logic to compute values is

int Row= blockIdx.y*blockDim.y + threadIdx.y;
int Col= blockIdx.x*blockDim.x + threadIdx.x;
.
.
if(i < 20000 && j < 20000)
{   

    {

        if(j == 1 || j % i == 0)
            d_M[i*SIZE+ j] = 1;
        else
            d_M[i*SIZE+ j] = 0;
    }
}

Can someone give me an idea where i might be wrong. Thank you in advance.

talonmies
  • 70,661
  • 34
  • 192
  • 269
Lasantos
  • 21
  • 6

1 Answers1

2

Since you haven't provided a complete code, it's not possible to determine all the issues that may be present. But you have a misinterpretation of block and grid dimensions (you have them reversed):

#define SIZE = 20000
#define BLOCK_WIDTH 16

/* Launch the CUDA kernel */
int numBlocks = ceil(SIZE / BLOCK_WIDTH);
dim3 dimGrid(BLOCK_WIDTH,BLOCK_WIDTH,1);
dim3 dimBlock(numBlocks,numBlocks,1);
redhefferMatrix<<<dimGrid, dimBlock>>>(d_M, SIZE);

The first kernel configuration parameter should be the dimensions of the grid in terms of number of blocks (in x and y, in this case). Your first kernel config parameter is dimGrid which you have defined as a dim3(BLOCK_WIDTH,BLOCK_WIDTH) quantity, i.e. 16x16 blocks. That's not what you intended I don't think, but not actually illegal.

Your second kernel configuration parameter should be the dimensions of the block in terms of number of threads (in x and y, in this case). Your second kernel parameter is dimBlocks, which you have defined as a dim3(20000/16, 20000/16) quantity, i.e. 1250x1250 threads. This is illegal, as CUDA threadblocks are limited to a total of 1024 threads, i.e. the product of the dimensions cannot exceed 1024.

So your kernel launch is illegal and your kernel is not even running. If you use proper cuda error checking and/or run your code with cuda-memcheck, you would discover this.

The fix may be fairly simple - reverse your sense of these config parameters:

dim3 dimBlock(BLOCK_WIDTH,BLOCK_WIDTH,1);
dim3 dimGrid(numBlocks,numBlocks,1);

Again, I cannot say this is the only issue, since you have not shown a complete code that I could actually test (which SO expects for questions like this.)

If you make the above change and things are still not working, I would suggest the following:

  1. Add the proper cuda error checking and run your code with cuda-memcheck as I already suggested.

  2. Provide a complete MCVE, i.e. a complete code that somebody else could copy, paste, and run. Also provide whatever is the output of the cuda-memcheck and error-checking on your system.

You should do the above 2 things before you ask for debugging help here on SO.

Community
  • 1
  • 1
Robert Crovella
  • 143,785
  • 11
  • 213
  • 257