0

okay so I'm trying to get a 2D array for cuda to work on, but it's becoming a pain. the error's are in the title and occur at the cudaMemcpy2D. I think the problem is obvious to trained eyes. Thank you in advance for any help, I've stepped ahead of my class which are currently learning Pointers.

#include <cuda_runtime.h>
#include <iostream>
#pragma comment (lib, "cudart")

/* Program purpose: pass a 10 x 10 matrix and multiply it by another 10x10 matrix */

float matrix1_host[100][100];
float matrix2_host[100][100];

float* matrix1_device;
float* matrix2_device;  
size_t pitch;
cudaError_t err;

__global__ void addMatrix(float* matrix1_device,float* matrix2_device, size_t pitch){
    // How this works
    // first we start to cycle through the rows by using the thread's ID
    // then we calculate an address from the address of a point in the row, by adding the pitch (size of each row) and  * it by
    // the amount of rows we've already completed, then we can use that address of somewhere at a start of a row to get the colums 
    // in the row with a normal array grab. 

    int r = threadIdx.x;

        float* rowofMat1 = (float*)((char*)matrix1_device + r * pitch);
        float* rowofMat2 = (float*)((char*)matrix2_device + r * pitch);
        for (int c = 0; c < 100; ++c) {
             rowofMat1[c] += rowofMat2[c];
        }

}

void initCuda(){
    err = cudaMallocPitch((void**)matrix1_device, &pitch, 100 * sizeof(float), 100);
    err = cudaMallocPitch((void**)matrix2_device, &pitch, 100 * sizeof(float), 100); 
    //err = cudaMemcpy(matrix1_device, matrix1_host, 100*100*sizeof(float), cudaMemcpyHostToDevice);
    //err = cudaMemcpy(matrix2_device, matrix2_host, 100*100*sizeof(float), cudaMemcpyHostToDevice);
    err = cudaMemcpy2D(matrix1_device, 100*sizeof(float), matrix1_host, pitch, 100*sizeof(float), 100, cudaMemcpyHostToDevice);
    err = cudaMemcpy2D(matrix2_device, 100*sizeof(float), matrix2_host, pitch, 100*sizeof(float), 100, cudaMemcpyHostToDevice);
}

void populateArrays(){
    for(int x = 0; x < 100; x++){
        for(int y = 0; y < 100; y++){
            matrix1_host[x][y] = (float) x + y;
            matrix2_host[y][x] = (float) x + y;
        }
    }
}

void runCuda(){
    dim3 dimBlock ( 100 );
    dim3 dimGrid ( 1 );
    addMatrix<<<dimGrid, dimBlock>>>(matrix1_device, matrix2_device, 100*sizeof(float)); 
    //err = cudaMemcpy(matrix1_host, matrix1_device, 100*100*sizeof(float), cudaMemcpyDeviceToHost);
    err = cudaMemcpy2D(matrix1_host, 100*sizeof(float), matrix1_device, pitch, 100*sizeof(float),100, cudaMemcpyDeviceToHost);
    //cudaMemcpy(matrix1_host, matrix1_device, 100*100*sizeof(float), cudaMemcpyDeviceToHost);
}

void cleanCuda(){
    err = cudaFree(matrix1_device);
    err = cudaFree(matrix2_device);

    err = cudaDeviceReset();
}


int main(){
    populateArrays();
    initCuda();
    runCuda();
    cleanCuda();
    std::cout << cudaGetErrorString(cudaGetLastError());
    system("pause");
    return 0;
}
Joshua Waring
  • 619
  • 7
  • 23

1 Answers1

3

First of all, in general you should have a separate pitch variable for matrix1 and matrix2. In this case they will be the same value returned from the API call to cudaMallocPitch, but in the general case they may not be.

In your cudaMemcpy2D line, the second parameter to the call is the destination pitch. This is just the pitch value that was returned when you did the cudaMallocPitch call for this particular destination matrix (ie. the first parameter).

The fourth parameter is the source pitch. Since this was allocated with an ordinary host allocation, it has no pitch other than its width in bytes.

So you have your second and fourth parameters swapped.

so instead of this:

err = cudaMemcpy2D(matrix1_device, 100*sizeof(float), matrix1_host, pitch, 100*sizeof(float), 100, cudaMemcpyHostToDevice);

try this:

err = cudaMemcpy2D(matrix1_device, pitch, matrix1_host, 100*sizeof(float), 100*sizeof(float), 100, cudaMemcpyHostToDevice);

and similarly for the second call to cudaMemcpy2D. The third call is actually OK since it's going in the opposite direction, the source and destination matrices are swapped, so they line up with your pitch parameters correctly.

chappjc
  • 30,359
  • 6
  • 75
  • 132
Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • Okay so I changed a few things around, I was thinking that the first one was the pitch of the host array, which had me really confused. Although I still get a error 11 InvalidValue – Joshua Waring Mar 15 '13 at 05:03
  • 1
    Well, you're doing pretty sloppy error checking, so you really have no clue which line the error is coming from. Is that the way they teach you to do error checking in your class? You should [check each and every cuda return value](http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api), especially when you are having problems. Anyway I missed that your parameters to `cudaMallocPitch` are also incorrect, you need an ampersand to take the address of the pointer you are passing: `err = cudaMallocPitch((void**)&matrix1_device, ...` – Robert Crovella Mar 15 '13 at 05:37
  • Actually, I go through with the debugger line by line and check the value of error. It's exactly from the first line of cudaMemcpy2D but apart from that, Thank you as that was the problem, I've been stuck on that for a little bit now. – Joshua Waring Mar 15 '13 at 05:41