0

Hi I want to create a 2dim array(N*(N+1)) with random values (on the host), copy it to the device and print both, the host array and the device array to see if I did it correctly.

The problem is that while the host array prints perfectly fine while the device array misses a lot of values and is a bit mixed up. I think there is something wrong with the way I deal with pointers, but I don't know what.

The following code creates and copies the matrix to the device:

#define DEC_COUNT (1000)
void create_matrix(cuda_matrix *matrix, int var_cnt, bool clear)
{
    cudaError error;
    double **h_matrix = (double **)malloc(sizeof(double *) * var_cnt);
    assert(h_matrix != NULL);


    if (clear) {
        for (int y = 0; y < var_cnt; y++) {
            h_matrix[y] = (double *)calloc((var_cnt+1), sizeof(double));
            assert(h_matrix[y] != NULL);
        }
    } else {
        for (int y = 0; y < var_cnt; y++) {
            h_matrix[y] = (double *)malloc(sizeof(double) * (var_cnt+1));
            assert(h_matrix[y] != NULL);
            for (int i = 0; i < var_cnt+1; ++i){
                srand(time(NULL)*(i+1)*(y+1));
                h_matrix[y][i] = ((double)rand()/(double)RAND_MAX)*DEC_COUNT;
            }
        }
    }

    printf("h_matrix:\n");
    print_matrix(h_matrix, var_cnt);

    error = cudaMallocPitch(&(matrix->d_matrix), &(matrix->pitch),
            sizeof(double)*(var_cnt+1), var_cnt);
    checkCudaErrors(error);

    error = cudaMemcpy2D(matrix->d_matrix, matrix->pitch, h_matrix,
            sizeof(double)*(var_cnt+1), sizeof(double)*(var_cnt+1), var_cnt, cudaMemcpyHostToDevice);
    checkCudaErrors(error);

    printf("d_matrix\n");

    print_matrix<<<1,1>>>(matrix->d_matrix, matrix->var_count, matrix->pitch);
    checkCudaErrors(cudaDeviceSynchronize());

    free_matrix(h_matrix, var_cnt);
}

Cuda print function:

__global__ void print_matrix(double *d_matrix, int height, size_t pitch)
{
    //assert(matrix != NULL);
    /*double *d_matrix = matrix->d_matrix;
    int height = matrix->var_count;
    size_t pitch = matrix->pitch;*/
    for (int j = 0; j < height; j++) {
        // image row
        double *row = (double*)((char*)d_matrix + j * pitch);
        for (int i = 0; i < height+1; i++){
            if (i == height)
                printf("|%.1f", (row[i] == -0.0)? 0.0 : row[i]);
            else
                printf("%.1f ", (row[i] == -0.0)? 0.0 : row[i]);
        }
        printf("\n");
    }
    printf("\n");
}

After running the program I get this output: h_matrix and d_matrix should be the same!

h_matrix:
80.4 465.7 568.3 663.8 554.6 650.4 748.9 642.3 |738.4
465.7 663.8 650.4 642.3 333.0 821.3 309.0 299.4 |495.5
568.3 650.4 738.4 821.3 407.5 495.5 584.6 168.1 |761.1
663.8 642.3 821.3 299.4 487.1 168.1 141.5 829.1 |513.6
554.6 333.0 407.5 487.1 559.5 843.8 414.8 490.9 |566.1
650.4 821.3 495.5 168.1 843.8 513.6 187.8 150.8 |322.8
748.9 309.0 584.6 141.5 414.8 187.8 249.8 523.5 |85.2
642.3 299.4 168.1 829.1 490.9 150.8 523.5 180.4 |344.3

d_matrix
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 |0.0
0.0 80.4 465.7 568.3 663.8 554.6 650.4 748.9 |642.3
738.4 0.0 465.7 663.8 650.4 642.3 333.0 821.3 |309.0
299.4 495.5 0.0 568.3 650.4 738.4 821.3 407.5 |495.5
584.6 168.1 761.1 0.0 663.8 642.3 821.3 299.4 |487.1
168.1 141.5 829.1 513.6 0.0 554.6 333.0 407.5 |487.1
559.5 843.8 414.8 490.9 566.1 0.0 650.4 821.3 |495.5
168.1 843.8 513.6 187.8 150.8 322.8 0.0 748.9 |309.0

I hope you can help me with this issue. I'm very new to cuda and this is actually my first cuda Program

user3025417
  • 395
  • 1
  • 5
  • 17
  • 2
    There are many questions on this topic if you want to do some research. One canonical question is linked in the CUDA tag info page. Another recent example is [here](http://stackoverflow.com/questions/41050300/how-do-i-allocate-memory-and-copy-2d-arrays-between-cpu-gpu-in-cuda-without-fl/41053215#41053215). Your code has a number of typical issues associated with an attempt at this. For example, it's inconvenient to have disjoint allocations in host code, and `cudaMemcpy2D` actually has nothing to do with being able to access a data array in doubly-subscripted fashion in device code. – Robert Crovella Dec 19 '16 at 21:12
  • @RobertCrovella Thank's for helping. I understand the Problem now. Im using a linear mem allocation now and it works! – user3025417 Dec 20 '16 at 14:12

0 Answers0