0

I am trying to copy 2d array to GPU but I get zeros instead of my array.

The 2d array was created on host as double ** type which array of pointers to 1d array as following.

//on host code
double** 2dArray;
  2dArray = (double**) malloc(arraySizeX*sizeof(double*));

  for (int i = 0; i < arraySizeX; i++)
    2dArray[i] = (double*) malloc(arraySizeY*sizeof(double));

  for (int i = 0; i < arraySizeX; i++)
    for (int j = 0; j < arraySizeX; j++)
       2dArray[i][j] = 0;

// fill Symmetric matrix 2dArray
  for (int i = 0; i < MATRIX_SIZE; i++) {
     for (int j = 0; j <= i; j ++) {
       2dArray[i][j] = (double)min(i+1,j+1);
     }
  } 

And I wrote this to copy it to GPU using cuda.

double  *d_A;
cudaMalloc( (void**)&d_A, (arraySizeX*arraySizeY)*sizeof(double) );

cudaMemcpy(d_A, 2dArray, (arraySizeX * arraySizeY) * sizeof(double) , cudaMemcpyHostToDevice);

cudakernel<<<1,1>>>(d_A,arraySizeX, arraySizeY);  // using 1 thread just to print values.

cuda code.

__global__ void cudakernel( double  *d_A,  arraySizeX, arraySizeY )
{
           cuPrintf("in device\n");
        for (int i = 0; i < arraySizeX * arraySizeY; i++) {
                        if(i%3==0)
                         cuPrintf("\n");

            cuPrintf("%lf ",d_A[i]);
        }
            cuPrintf("\n");
}

I have started with small array on host 3 x 3 with below values

0.000000, 0.000000, 0.000000
2.000000, 0.000000, 0.000000
2.000000, 3.000000, 0.000000

and the output I get is simply zeros

0.000000 0.000000 0.000000 
0.000000 0.000000 0.000000 
0.000000 0.000000 0.000000 

Any ideas what I did wrong..

Abdullah
  • 314
  • 1
  • 3
  • 10
  • 1
    Your matrix on the host is an array of pointers to 1D-arrays. So you need to copy each of those 1-D arrays with a separate host->device copy. Your code shows only a single host->device copy. That does not make sense. – njuffa Apr 03 '15 at 03:04
  • Hi talonmies, the question is not duplicate. The questions I saw in this website all were about copying 2d array in brakets [][] form not like the one I am asking which is in pointer to pointer form. – Abdullah Apr 03 '15 at 22:07
  • Hi njufa, I have tried that approach and I got bus error. This where were I put the cudaMallc inside loop – Abdullah Apr 03 '15 at 22:08
  • 1
    Most likely, you want: One contiguous 2D matrix allocated on the GPU. This requires one call to `cudaMalloc()`. Since your storage on the host is *not* contiguous, but a collection of 1D vectors, you then need to use a loop over the number of vectors. Each loop iteration issues one call to `cudaMemcpy()` to copy one 1D vector to the appropriate place on the GPU. – njuffa Apr 03 '15 at 23:05

0 Answers0