CUDA Vision Studio Trouble with Kernel

Question

I have a project where I have to create a program in Visual Studio/CUDA using GPU Threads and Blocks that contain 2 random Arrays A and B used for computation before storing the result in a third Array C the values based on this equation:

C(1,j)=max(A(:,j)+max(B(:,j)) (Note: the ":" operator means for all the lines)

Here is my Kernel function

           __global__ void mykernel(int **a, int **b,int **c,const int width)
             {
int col= threadIdx.x;
int tempa=0;
int tempb=0;

for(int k=0;k<width;k++){

    int maxA=a[k][col];

    if  (maxA>tempa){
    tempa=maxA;
    }

    int maxB=b[k][col];

    if  (maxB>tempb){
    tempb=maxB;
    }

    }


c[0][col] =tempa+tempb;
}

And my main

      int main()
       {
const int  dim= 5;
const int rows=5;
const int columns=5;
size_t dsize = rows*columns*sizeof(int);
//Αντίγραφα πινάκων δεδομένων της CPU
int *A[dim][dim];
int *B[dim][dim];
int *C[1][dim];

//Αντίγραφα πινάκων δεδομένων της GPU
int *d_A[dim][dim],*d_B[dim][dim],*d_C[1][dim]; 

//Εξασφάλιση μνήμης για τα αντίγραφα δεδομένων της CPU 
A[dim][dim]= (int *)malloc(dsize); 
B[dim][dim] = (int *)malloc(dsize); 
C[1][dim]= (int *)malloc(dsize);

//Γέμισμα των πινάκων με τυχαίες τιμές μεταξυ 
 for (int i=0;i<rows;i++)
   for (int j=0;j<columns;j++){


  *A[i][j]=rand() %5+1;
  *B[i][j]=rand() %5+1;

 }

//Εξασφάλιση μνήμης για τα αντίγραφα δεδομένων της GPU και αντιγραφή δεδομένων CPU προς GPU  
cudaMalloc((void **)&d_A, dsize);
cudaMemcpy(d_A, A, dsize, cudaMemcpyHostToDevice);

cudaMalloc((void **)&d_B, dsize);
cudaMemcpy(d_B, B, dsize, cudaMemcpyHostToDevice);

cudaMalloc((void **)&d_C, dsize);

//Κλήση Kernel συνάρτησης στην GPU με χρήση 5 Threads σε 1 Block

mykernel<<<1,5>>>(d_A,d_B,d_C,dim);

//Αντιγραφή αποτελέσματος στην μνήμη της CPU
cudaMemcpy(C, d_C, dsize, cudaMemcpyDeviceToHost);

//Εκκαθάριση Μνήμης για CPU και GPU
free(A);
free(B);
free(C);

cudaFree(d_A); cudaFree(d_B); cudaFree(d_C);

while(1){};
return 0;
  }

I think I got the algorithm right but in this line I get the following error:

Line

mykernel<<<1,5>>>(d_A,d_B,d_C,dim);

Error

argument of type "int *(*)[5]" is incompatible with parameter of type "int **"

Any suggestions on what I should do?

P.S:Its my first post so im sorry in advance if i messed up the needed format. --> quite a good format for a first post! ;-) — Allan, Dec 04 '17 at 04:29
@Allan Thanks my friend!Really looking forward to a hint as ive been cracking my brain open for 4 straight hours now!haha — dzz, Dec 04 '17 at 04:37
There is a general lack of understanding of certain C programming concepts evident here. There are probably on the order of 3-6 different kinds of errors evident in this code that would have to be unravelled in order to arrive at something functional. In general, you appear to be interested in using a 2D array with CUDA, so reviewing [this](https://stackoverflow.com/questions/45643682/cuda-using-2d-and-3d-arrays/45644824#45644824) may be beneficial, but it does not directly address all the errors in your code. — Robert Crovella, Dec 04 '17 at 17:22

score 2 · Answer 1 · answered Dec 04 '17 at 17:51

First of all, any time you are having trouble with a CUDA code, I recommend using proper CUDA error checking and running your code with cuda-memcheck (e.g. from the command line).
This isn't doing what you think:
```
int *A[dim][dim];
```
this is creating a 2-dimensional array of pointers. What you want is a pointer to a 2-dimensional array (of int).
In C, when you define an array, you cannot assign anything to elements at the array dimension:
```
A[dim][dim]= (int *)malloc(dsize); 
```
arrays only go up to size of dim- 1. A[dim][dim] does not exist and is out-of-range for your array definition. This appears to be related to your general confusion around using 2D arrays in C or C++.
This line is similarly broken (assigning a numerical value to a region associated with an unallocated pointer) and is further evidence of your confusion around 2D arrays in C:
```
*A[i][j]=rand() %5+1;
```
Your handling of d_A, d_B and d_C are similarly broken.

It's evident that you are wanting to use a 2D array in your CUDA kernel, so the right approach here is to probably pick a method from a set of canonical examples. Since you appear to know the array dimensions at compile time, we'll leverage that. Here is a fully worked example showing the mods:

$ cat t1344.cu
#include <iostream>
const int dim = 5;
typedef int my_arr[dim];

__global__ void mykernel(my_arr *a, my_arr *b, my_arr *c,const int width)
{
int col= threadIdx.x;
int tempa=0;
int tempb=0;

for(int k=0;k<width;k++){

    int maxA=a[k][col];

    if  (maxA>tempa){
    tempa=maxA;
    }

    int maxB=b[k][col];

    if  (maxB>tempb){
    tempb=maxB;
    }

    }


c[0][col] =tempa+tempb;
}

int main()
       {
const int rows=dim;
const int columns=dim;
size_t dsize = rows*columns*sizeof(int);
//Αντίγραφα πινάκων δεδομένων της CPU;
my_arr *A, *B, *C;

//Αντίγραφα πινάκων δεδομένων της GPU
my_arr *d_A,*d_B,*d_C;

//Εξασφάλιση μνήμης για τα αντίγραφα δεδομένων της CPU
A = (my_arr *)malloc(dsize);
B = (my_arr *)malloc(dsize);
C = (my_arr *)malloc(dsize);

//Γέμισμα των πινάκων με τυχαίες τιμές μεταξυ
 for (int i=0;i<rows;i++)
   for (int j=0;j<columns;j++){


  A[i][j]=rand() %5+1;
  B[i][j]=rand() %5+1;

 }

//Εξασφάλιση μνήμης για τα αντίγραφα δεδομένων της GPU και αντιγραφή δεδομένων CPU προς GPU
cudaMalloc((void **)&d_A, dsize);
cudaMemcpy(d_A, A, dsize, cudaMemcpyHostToDevice);

cudaMalloc((void **)&d_B, dsize);
cudaMemcpy(d_B, B, dsize, cudaMemcpyHostToDevice);

cudaMalloc((void **)&d_C, dsize);

//Κλήση Kernel συνάρτησης στην GPU με χρήση 5 Threads σε 1 Block

mykernel<<<1,5>>>(d_A,d_B,d_C,dim);

//Αντιγραφή αποτελέσματος στην μνήμη της CPU
cudaMemcpy(C, d_C, dsize, cudaMemcpyDeviceToHost);
for (int i = 0; i < dim; i++)  std::cout << C[0][i] << std::endl;
//Εκκαθάριση Μνήμης για CPU και GPU
free(A);
free(B);
free(C);

cudaFree(d_A); cudaFree(d_B); cudaFree(d_C);

return 0;
  }
$ nvcc -arch=sm_35 -o t1344 t1344.cu
$ cuda-memcheck ./t1344
========= CUDA-MEMCHECK
9
8
8
9
8
========= ERROR SUMMARY: 0 errors
$

I haven't fully verified the results, but they seem plausible to me.

First of all,I would really like to thank you for taking the time to state all these mistakes and confusions i made.Im starting to see the picture i think.So in the answer you gave i cant really understand how the arrays are supposed to be 2D after you typedef them as int[5] arrays.Or you didnt edit that and i have to pick a method as you said for it?Also another question(im getting tiring i know),the method i used with the rand fuction to give them random values before transferring them to the GPU isnt valid why?(assuming i find a way to treat the 2D needed as 1D some way) — dzz, Dec 04 '17 at 19:28
I created arrays of `int[5]` arrays. An array of `int[5]` arrays is effectively a 2D array. A pointer to an `int[5]` item can also be a pointer to multiple `int[5]` items, just as a pointer to a single `int` can also be a pointer to multiple `int`, thus, an array. There's nothing wrong with your usage of `rand()`. But when you take that numerical value and assign to a location using a pointer you haven't properly initialized (`*A[i][j]`), that won't work. It's illegal, and you wouldn't be guaranteed to get back the value you stored there. — Robert Crovella, Dec 04 '17 at 23:57

CUDA Vision Studio Trouble with Kernel

1 Answers1