0

I'm writting app in c which can convert .png image into grayscale. I'm using c and cuda. I have problem with cuda code and I don't know why. (I'm cuda begginer).

My transformation function looks:

__global__
void setPixelToGrayscale(unsigned char *image)
{
    int i = threadIdx.x*4;
    float gray;
    float r, g, b;
    r = image[i + 0];
    g = image[i + 1];
    b = image[i + 2];
    gray = .299f*r + .587f*g + .114f*b;
    image[i + 0] = gray;
    image[i + 1] = gray;
    image[i + 2] = gray;
    image[i + 3] = 255;
}


void transformToGrayCuda(rgb_image *img)
{

    unsigned char* image = img->image;
    unsigned char* image_d;
    unsigned width = img->width;
    unsigned height = img->height;
    int N = (int)width * (int)height; 
    size_t size = N * sizeof(unsigned char);
    cudaMalloc((void **) image_d, size);
    cudaMemcpy(image_d, image,  size, cudaMemcpyHostToDevice);
    setPixelToGrayscale<<<1, N>>>(image_d);
    cudaMemcpy(image, image_d, size, cudaMemcpyDeviceToHost);
    cudaFree(image_d);

/* this works fine if cuda code is commented
int j=0;
for(j=0; j<N; j++)
{
    int i = j*4;
    float gray;
    float r, g, b;
    r = image[i + 0];
    g = image[i + 1];
    b = image[i + 2];
    gray = .299f*r + .587f*g + .114f*b;
    image[i + 0] = gray;
    image[i + 1] = gray;
    image[i + 2] = gray;
    image[i + 3] = 255;
}
*/

}

I've done something wrong in cuda version because, when cuda code is commented and ill run in loop c code everything works fine. Why my cuda code doesn't work properly?

EDIT: it's my test image: https://i.stack.imgur.com/B3yJu.png

it's my result with cuda: https://i.stack.imgur.com/bzmWJ.png

it's my result with only c code: http:// [no space here, i have no rep] i.imgur.com/lU4vIiK.png

that's what i meant when i wrote that my cuda code does not work properly.

user3565261
  • 344
  • 1
  • 6
  • 19
  • Impossible to say without a complete example someone else could run. – talonmies May 03 '14 at 17:44
  • http://sh.st/wFY2T here's link to my code – user3565261 May 03 '14 at 17:58
  • 1
    Lol, I have no intention of watching a video ad just to see your source code. Why not edit it into the question? (or at least use pastebin.com or similar) Anyway, post a complete code *in your question*. Also, put [proper cuda error checking](http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api) into your code. And run your code with `cuda-memcheck`. – Robert Crovella May 03 '14 at 18:32

2 Answers2

2

The problem in your code is:

cudaMalloc((void **) image_d, size);

You should give a pointer to the pointer, not cast the variable it to. The right code is:

cudaMalloc(&image_d, size);
KugBuBu
  • 630
  • 5
  • 21
  • doesn't change anything – user3565261 May 03 '14 at 17:42
  • 2
    Weighted average is correct. The numbers OP used are the NTSC weights. In sRGB, the weights are Y = 0.2126 * R + 0.7152 * G + 0.0722 * B. Both sets of weights account for the human eye being more sensitive to green and less sensitive to blue. – Glenn Randers-Pehrson May 04 '14 at 00:25
  • @GlennRanders-Pehrson In the first revision he said the image wasn't looking proper. Another WTF is: I got a downvote being the right answer. (The user wrote the answer in the other answer's comments and chose the other post to be the right answer) – KugBuBu May 04 '14 at 12:14
  • I downvoted because of the suggestion to use unweighted average of colors. See http://www.poynton.com/notes/colour_and_gamma/ColorFAQ.html#RTFToC9 which explains why the values should be weighted. – Glenn Randers-Pehrson May 04 '14 at 13:49
  • Incorrect statement about weighting has been removed from the answer, so +1. – Glenn Randers-Pehrson May 04 '14 at 14:08
  • @GlennRanders-Pehrson Monitor shouldn't weight the colors itself to our eyes' color like that camera captures frames that looks like how we see the world too? Thanks btw. – KugBuBu May 04 '14 at 14:10
2

What is the size of N?. You are running all N threads in a single block. There is a limit of 512 or 1024 threads per block depending upon the GPU. Please change the number of blocks if N > 512. With nummber of blocks = 1 + N/ 512 and threads per block = 512. Here, you need to check in kernel if threadid < N to avoid accessing out-of-bounds memory.

Also, kernel executions are async. So, you need a cudadevicesynchronize() call after kernel invocation.

If you give exact error/ issue you are getting then I can provide more help.

Yogi Joshi
  • 786
  • 1
  • 6
  • 19
  • 2
    cudaMemcpy and cudaFree are both blocking calls. An explicit synchronisation call is not required in this case. – talonmies May 03 '14 at 17:43
  • I changed it, but it still doenst change image to grayscale. I uploaded my code in comment to main post, can you check it? – user3565261 May 03 '14 at 18:03
  • There is a race condition in your code. image[i + 2] for ith thread and image[i + 1] in (i + 1)th thread are going to write at same location possibly at the same time. Race conditions cause unpredictable results. Please refactor your code for this. I believe that would not be much difficult for your code's logic. – Yogi Joshi May 03 '14 at 18:08
  • Also, check if the size variable fits within 'int' range. Sometime, the 'int' range might not be able to accommodate and you need to use 'long' in such cases. – Yogi Joshi May 03 '14 at 18:14
  • Yeah, but i = threadID * 4, so its ith thread needs access to image[i*4] : image[i*4 +3] – user3565261 May 03 '14 at 18:14
  • And my N in this case is 128*128 – user3565261 May 03 '14 at 18:15
  • even if i = threadid*4, it would result in race condition as (i+2) and (i+1) of consecutively numbered threads would point to same element. – Yogi Joshi May 03 '14 at 18:29
  • 1
    Well, no, (4+2) != (8+1). I found the problem. In cudaMalloc((void **) image_d, size); I had to use pointer to image_d, not image_d Your advaices about blocks was usefull. Thanks – user3565261 May 03 '14 at 18:37