-1

I'm practicing this simple code which takes a two-dimensional array and sums them up with CUDA. In the end, the result of C is not what I accepting. Also, I was wondering whether I can use vector instead of c-style arrays.

#include <iostream>
using namespace std; 
#define N 2   
__global__ void MatAdd(double** a, double** b,
                       double** c)
{
    int i = threadIdx.x;
    int j = threadIdx.y;
    c[i][j] = a[i][j] + b[i][j];
}

int main()
{

    
    double a[2][2]= {{1.0,2.0},{3.0,4.0}};
    double b[2][2]= {{1.0,2.0},{3.0,4.0}};
    double c[2][2]; // it will be the result! 
    double**  a_d; 
    double**  b_d;
    double**  c_d; 
    int d_size = N * N * sizeof(double);
    int numBlocks = 1;
        dim3 threadsPerBlock(N, N);
        
        cudaMalloc(&a_d, d_size);
        
        cudaMalloc(&b_d, d_size);
        
        cudaMalloc(&c_d, d_size);
        
        cudaMemcpy(a_d, a, d_size, cudaMemcpyHostToDevice);
    
        cudaMemcpy(b_d, b, d_size, cudaMemcpyHostToDevice);
        
        cudaMemcpy(c_d, c, d_size, cudaMemcpyHostToDevice);
        
        MatAdd<<<numBlocks, threadsPerBlock>>>(a_d, b_d, c_d);
        
        //cudaDeviceSynchronize();
        cudaMemcpy(c, c_d, d_size, cudaMemcpyDeviceToHost);
     
     for (int i=0; i<N; i++){
        for(int j=0; j<N; j++){
            
            cout<<c[i][j]<<endl;    
        }
     
    }
    return 0; 
    
   
}
Jérôme Richard
  • 41,678
  • 6
  • 29
  • 59
MA19
  • 510
  • 3
  • 15
  • 1
    Does this answer your question? [How do I correctly set up, access, and free a multidimensional array in C?](https://stackoverflow.com/questions/12462615/how-do-i-correctly-set-up-access-and-free-a-multidimensional-array-in-c) – einpoklum Apr 03 '21 at 20:56
  • 1
    What you're experiencing is not a CUDA issue, but a misunderstanding of how (different kinds of) multidimensional arrays work. See [this answer](https://stackoverflow.com/a/12462832/1593077) here on StackOverflow. Also, [this](http://c-faq.com/aryptr/dynmuldimary.html) has an educational illustration. – einpoklum Apr 03 '21 at 20:57

1 Answers1

1

You must not use the double** type in this case. Alternatively, you should use a flatten array that contains all the values of a given matrix in a double*-type variable.

The heart of the problem is located in the following line (and the similar next ones):

cudaMemcpy(a_d, a, d_size, cudaMemcpyHostToDevice);

Here you assume that a and a_d are compatible types, but they are not. A double**-typed variable is a pointer that refer to one or more pointers in memory (typically an array of pointer referencing many different double-typed arrays), while a double*-typed variable or a static 2D C array refer to a contiguous location in memory.

Note that you can access to a given (i,j) cell of a matrix using matrix[N*i+j], where N is the number of column, assuming matrix is a flatten matrix of type double* and use a row-major ordering.

Jérôme Richard
  • 41,678
  • 6
  • 29
  • 59
  • changing to `*double` on function prototype and also changing this `cudaMemcpy(a_d, a, d_size, cudaMemcpyHostToDevice)` to `cudaMemcpy((void**)&a_d, a, d_size, cudaMemcpyHostToDevice);` gives me the error that : `expression must have pointer-to-object type` where a and b are summing up – MA19 Apr 03 '21 at 20:18
  • Also, is there any way accessing `i ,j` rather than using flatten matrix? This example is somehow similar to NVIDIA website that there flatten matrix was not used – MA19 Apr 03 '21 at 20:31
  • 1
    You do not need a cast in `cudaMemcpy` nor to take the address of `a_d`. Note that static multi-dimentional C arrays are always flatten in memory. Note also that GPU are mainly designed to work on contiguous data in memory, so multidimensional arrays are regularly flatten. I strongly advise you to read the [CUDA programming guide](http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/CUDA_C_Programming_Guide.pdf). You can especially find an example of matrix multiplication page 22. – Jérôme Richard Apr 03 '21 at 20:50
  • 4
    [This](https://stackoverflow.com/questions/45643682/cuda-using-2d-and-3d-arrays/45644824#45644824) may be of interest. Ther is no reason that you cannot use `double**` and/or doubly-subscripted arrays if you wish to. The flatten-ing advice is good conventional wisdom, however. Your previous/now-deleted question had one possible correct approach already, for doubly-subscripted access. – Robert Crovella Apr 03 '21 at 20:53
  • @JérômeRichard looking into this pdf, I was exactly trying to mimic the example of page9. – MA19 Apr 03 '21 at 20:55
  • 2
    There are much newer versions of the programming guide available. I'm not sure why anyone would reference a version from CUDA 3.2 that is over 10 years old. The latest documentation can always be found at docs.nvidia.com – Robert Crovella Apr 03 '21 at 22:49