correct way of copying and printing 2dim array on CUDA device

Question

Hi I want to create a 2dim array(N*(N+1)) with random values (on the host), copy it to the device and print both, the host array and the device array to see if I did it correctly.

The problem is that while the host array prints perfectly fine while the device array misses a lot of values and is a bit mixed up. I think there is something wrong with the way I deal with pointers, but I don't know what.

The following code creates and copies the matrix to the device:

#define DEC_COUNT (1000)
void create_matrix(cuda_matrix *matrix, int var_cnt, bool clear)
{
    cudaError error;
    double **h_matrix = (double **)malloc(sizeof(double *) * var_cnt);
    assert(h_matrix != NULL);


    if (clear) {
        for (int y = 0; y < var_cnt; y++) {
            h_matrix[y] = (double *)calloc((var_cnt+1), sizeof(double));
            assert(h_matrix[y] != NULL);
        }
    } else {
        for (int y = 0; y < var_cnt; y++) {
            h_matrix[y] = (double *)malloc(sizeof(double) * (var_cnt+1));
            assert(h_matrix[y] != NULL);
            for (int i = 0; i < var_cnt+1; ++i){
                srand(time(NULL)*(i+1)*(y+1));
                h_matrix[y][i] = ((double)rand()/(double)RAND_MAX)*DEC_COUNT;
            }
        }
    }

    printf("h_matrix:\n");
    print_matrix(h_matrix, var_cnt);

    error = cudaMallocPitch(&(matrix->d_matrix), &(matrix->pitch),
            sizeof(double)*(var_cnt+1), var_cnt);
    checkCudaErrors(error);

    error = cudaMemcpy2D(matrix->d_matrix, matrix->pitch, h_matrix,
            sizeof(double)*(var_cnt+1), sizeof(double)*(var_cnt+1), var_cnt, cudaMemcpyHostToDevice);
    checkCudaErrors(error);

    printf("d_matrix\n");

    print_matrix<<<1,1>>>(matrix->d_matrix, matrix->var_count, matrix->pitch);
    checkCudaErrors(cudaDeviceSynchronize());

    free_matrix(h_matrix, var_cnt);
}

Cuda print function:

__global__ void print_matrix(double *d_matrix, int height, size_t pitch)
{
    //assert(matrix != NULL);
    /*double *d_matrix = matrix->d_matrix;
    int height = matrix->var_count;
    size_t pitch = matrix->pitch;*/
    for (int j = 0; j < height; j++) {
        // image row
        double *row = (double*)((char*)d_matrix + j * pitch);
        for (int i = 0; i < height+1; i++){
            if (i == height)
                printf("|%.1f", (row[i] == -0.0)? 0.0 : row[i]);
            else
                printf("%.1f ", (row[i] == -0.0)? 0.0 : row[i]);
        }
        printf("\n");
    }
    printf("\n");
}

After running the program I get this output: h_matrix and d_matrix should be the same!

h_matrix:
80.4 465.7 568.3 663.8 554.6 650.4 748.9 642.3 |738.4
465.7 663.8 650.4 642.3 333.0 821.3 309.0 299.4 |495.5
568.3 650.4 738.4 821.3 407.5 495.5 584.6 168.1 |761.1
663.8 642.3 821.3 299.4 487.1 168.1 141.5 829.1 |513.6
554.6 333.0 407.5 487.1 559.5 843.8 414.8 490.9 |566.1
650.4 821.3 495.5 168.1 843.8 513.6 187.8 150.8 |322.8
748.9 309.0 584.6 141.5 414.8 187.8 249.8 523.5 |85.2
642.3 299.4 168.1 829.1 490.9 150.8 523.5 180.4 |344.3

d_matrix
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 |0.0
0.0 80.4 465.7 568.3 663.8 554.6 650.4 748.9 |642.3
738.4 0.0 465.7 663.8 650.4 642.3 333.0 821.3 |309.0
299.4 495.5 0.0 568.3 650.4 738.4 821.3 407.5 |495.5
584.6 168.1 761.1 0.0 663.8 642.3 821.3 299.4 |487.1
168.1 141.5 829.1 513.6 0.0 554.6 333.0 407.5 |487.1
559.5 843.8 414.8 490.9 566.1 0.0 650.4 821.3 |495.5
168.1 843.8 513.6 187.8 150.8 322.8 0.0 748.9 |309.0

I hope you can help me with this issue. I'm very new to cuda and this is actually my first cuda Program

There are many questions on this topic if you want to do some research. One canonical question is linked in the CUDA tag info page. Another recent example is [here](http://stackoverflow.com/questions/41050300/how-do-i-allocate-memory-and-copy-2d-arrays-between-cpu-gpu-in-cuda-without-fl/41053215#41053215). Your code has a number of typical issues associated with an attempt at this. For example, it's inconvenient to have disjoint allocations in host code, and `cudaMemcpy2D` actually has nothing to do with being able to access a data array in doubly-subscripted fashion in device code. — Robert Crovella, Dec 19 '16 at 21:12
@RobertCrovella Thank's for helping. I understand the Problem now. Im using a linear mem allocation now and it works! — user3025417, Dec 20 '16 at 14:12

correct way of copying and printing 2dim array on CUDA device

0 Answers0