1

I am trying to run a c program using cuda the code does some math operations on an array of consecutive numbers (where every thread add elements of a row and check the last array element and return a value of the sum or zero if the conditions are met). I don't have NVIDIA GPU so I wrote my code on google colab notebook.

The problem that I have encountered was not being able to debug the program. It outputs nothing at all no error messages and no output. There's something wrong with the code but I cannot know where after reviewing it a few times.

Here's the code:

#include <iostream>

__global__ void matrixadd(int *l,int *result,int digits ,int possible_ids )

{  
    int sum=0;
    int zeroflag=1;
    int identicalflag=1;
    int id=  blockIdx .x * blockDim .x + threadIdx .x;
 
if(id<possible_ids)
{
    if (l[(digits*id)+digits-1]==0) zeroflag=0;/*checking if the first number is zero*/

    for(int i=0; i< digits-1;i++)/*edited:for(int i=0; i< digits;i++) */
          {
            
            if(l[(digits*id)+i]-l[(digits*id)+i+1]==0)
            identicalflag+=1; /* checking if 2 consequitive numbers are identical*/

            sum = sum + l[(digits*id)+i]; /* finding the sum*/
         }
    if (identicalflag!=1)identicalflag=0;
    result[id]=sum*zeroflag*identicalflag;
}
}

int main()
{
     int digits=6;
     int possible_ids=pow(10,digits);
/*populate the array */
int* a ;
 a= (int *)malloc((possible_ids * digits) * sizeof(int));
 int the_id,temp=possible_ids;

  for (int i = 0; i < possible_ids; i++) 
    { 
        temp--;
        the_id=temp;
        for (int j = 0; j < digits; j++)
        {  
        a[i * digits + j] = the_id % 10;    
        if(the_id !=0) the_id /= 10;
        }

    }
 /*the numbers will appear in reversed order  */

/*allocate memory on host and device then invoke the kernel function*/
    int *d_a,*d_c,*c;
    int size=possible_ids * digits;
    c= (int *)malloc(possible_ids * sizeof(int));/*results matrix*/

    cudaMalloc((void **)&d_a,size*sizeof(int));
    cudaMemcpy(d_a,a,size*sizeof(int),cudaMemcpyHostToDevice);
    cudaMalloc((void **)&d_c,possible_ids*sizeof(int));
/*EDITED: cudaMalloc((void **)&d_c,digits*sizeof(int));*/
 
matrixadd<<<ceil(possible_ids/1024.0),1024>>>(d_a,d_c,digits,possible_ids);
cudaMemcpy(c,d_c,possible_ids*sizeof(int),cudaMemcpyDeviceToHost);

 int acc=0;
for (int k=0;k<possible_ids;k++)
{
    if (c[k]==7||c[k]==17||c[k]==11||c[k]==15)continue;
    acc += c[k];
 }
printf("The number of possible ids %d",acc);
}
  
Kafka
  • 41
  • 7
  • 1
    [how to debug CUDA C++](https://www.olcf.ornl.gov/calendar/cuda-debugging/). The first recommendation there is to use [proper CUDA error checking](https://stackoverflow.com/questions/14038589). if you had done that you would receive a message that the last `cudaMemcpy` call is returning an error. That would focus your attention there. Now, focusing on there, does it make sense to allocate `d_c` with a size of `digits*sizeof(int)` (where `digits` is 6), but attempt to transfer from it a size of `possible_ids*sizeof(int)` (where `possible_ids` is `pow(10,digits)`) ? It does not. – Robert Crovella Jan 07 '22 at 20:00
  • Thanks for your insight and help Mr. Robert – Kafka Jan 07 '22 at 20:40
  • 1
    After you fix that issue, the next thing you should do is run your code with `compute-sanitizer` or `cuda-memcheck` (depending on which GPU you have in your colab instance) and observe the reported error. Follow the instructions [here](https://stackoverflow.com/questions/27277365/unspecified-launch-failure-on-memcpy/27278218#27278218) to localize that error to a single line of kernel code. You haven't explained what your code is supposed to do, how your algorithm is supposed to work or what would be considered "correct output" so that is as far as I can go. – Robert Crovella Jan 07 '22 at 22:36
  • 2
    You are doing invalid indexing into the `l` array in this line of code: `if(l[(digits*id)+i]-l[(digits*id)+i+1]==0)` – Robert Crovella Jan 07 '22 at 22:38
  • @RobertCrovella I am trying to compare two adjacent elements within a single row and check if they're equal. I noticed now that in the last comparison i step about of the row boundary is that what you mean? – Kafka Jan 10 '22 at 15:33

2 Answers2

0

You are doing invalid indexing into the l array in this line of code: if(l[(digits*id)+i]-l[(digits*id)+i+1]==0)

From comment by Robert Covella

Dharman
  • 30,962
  • 25
  • 85
  • 135
Kafka
  • 41
  • 7
-1

If you are using python code, you can use 'pdb' built-in breakpoint function. put the following line of command at the top of your script.

import pdb

then before the line, you want to debug put the following command

pdb.set_trace()

you will get '(Pdb), then empty box' to insert the command. If you want to continue to the next line put 'n' or you can use 's' to see the detailed work of your current line command.

Suppose you are interested in debugging python code. Enjoy it!

  • 1
    How does this help with debugging compiled C++ code running on a GPU, which is what the actual question is asking about? – talonmies Dec 22 '22 at 09:24