0

I am learning CUDA and still at the very beginner level. I am trying a simple assignment but my code crashes when I run it and I am not sure why. Any help would be appreciated.

EDIT: Crashes on cudaMemcpy and in Image structure, the pixelVal is of type int**. Is that the cause?

Original C++ code:

void Image::reflectImage(bool flag, Image& oldImage)
/*Reflects the Image based on users input*/
{
    int rows = oldImage.N;
    int cols = oldImage.M;
    Image tempImage(oldImage);

    for(int i = 0; i < rows; i++)
    {
        for(int j = 0; j < cols; j++)
        tempImage.pixelVal[rows - (i + 1)][j] = oldImage.pixelVal[i][j];
    }
    oldImage = tempImage;
}

My CUDA kernel & code:

#define NTPB 512
__global__ void fliph(int* a, int* b, int r, int c)
{
    int i = blockIdx.x * blockDim.x + threadIdx.x;
    int j = blockIdx.y * blockDim.y + threadIdx.y;

    if (i >= r || j >= c)
        return;
    a[(r - i * c) + j] = b[i * c + j];
}
void Image::reflectImage(bool flag, Image& oldImage)
/*Reflects the Image based on users input*/
{
    int rows = oldImage.N;
    int cols = oldImage.M;
    Image tempImage(oldImage);
    if(flag == true) //horizontal reflection
    {
     //Allocate device memory
     int* dpixels;
     int* oldPixels;
     int n = rows * cols;
     cudaMalloc((void**)&dpixels, n * sizeof(int));
     cudaMalloc((void**)&oldPixels, n * sizeof(int));
     cudaMemcpy(dpixels, tempImage.pixelVal, n * sizeof(int), cudaMemcpyHostToDevice);
     cudaMemcpy(oldPixels, oldImage.pixelVal, n * sizeof(int), cudaMemcpyHostToDevice);
     int nblks = (n + NTPB - 1) / NTPB;
     fliph<<<nblks, NTPB>>>(dpixels, oldPixels, rows, cols);
     cudaMemcpy(tempImage.pixelVal, dpixels, n * sizeof(int), cudaMemcpyDeviceToHost);
     cudaFree(dpixels);
     cudaFree(oldPixels);
    }
    oldImage = tempImage;
}
tomix86
  • 1,336
  • 2
  • 18
  • 29
Bhrugesh Patel
  • 1,096
  • 5
  • 20
  • 38
  • 2
    Your block and grid is 1D. Why are you using 2D indexing inside the kernel. The variable `j` would always be 0 in the kernel. – sgarizvi Apr 04 '13 at 17:14
  • 1
    By quick review, code looks without problems (except @sgar91 note). I would recommend you to provide your program with error checking for further specification of your problem. Look [at](http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api) this post. – stuhlo Apr 04 '13 at 17:25
  • I count 7 CUDA API calls, and see no error checking at all! First step: check for errors and try and narrow done where the problem is originating. – talonmies Apr 04 '13 at 18:03
  • @BhrugeshPatel: You said it crashed on `memcpy`. But there is no `memcpy` call in that code. Do you mean `cudaMemcpy`? There are *three* of those. Which one? Details matter here. Help us help you.... – talonmies Apr 04 '13 at 18:33
  • @talonmies Yes, I meanth the cudaMemcpy. It crashes on the very first cudaMemcpy. cudaMemcpy(dpixels, tempImage.pixelVal, n * sizeof(int), cudaMemcpyHostToDevice); – Bhrugesh Patel Apr 04 '13 at 18:37
  • You cannot pass a `int **` to cudaMemcpy and expect to get sensible results. Please review the [definition of cudaMemcpy](http://docs.nvidia.com/cuda/cuda-runtime-api/index.html#group__CUDART__MEMORY_1g48efa06b81cc031b2aa6fdc2e9930741). `src` is the source memory address, not a pointer to the source memory address. Both `src` and `dst` have the same level of indirection. What makes you think you can pass an `int *` for one and an `int **` for the other? And, as sgar91 points out, your kernel is messed up because you're launching a 1D array of threadblocks but pretending that it is 2D. – Robert Crovella Apr 04 '13 at 20:50

1 Answers1

1

You have to create a 2D Grid in order to process the image using 2D indices i and j. In the current case, the kernel is processing only the first row of the image.

To create a 2D indexing mechanism, create a 2D block and 2D grid like this:

const int BLOCK_DIM = 16;

dim3 Block(BLOCK_DIM,BLOCK_DIM);

dim3 Grid;
Grid.x = (cols + Block.x - 1)/Block.x;
Grid.y = (rows + Block.y - 1)/Block.y;

fliph<<<Grid, Block>>>(dpixels, oldPixels, rows, cols);
sgarizvi
  • 16,623
  • 9
  • 64
  • 98