0

Dear DOWNVOTERS: kindly let me know the reason of down vote. I have already accepted an answer which means that the person was able to understand the problem and a minimal working example was not required. Secondly, I wanted it to be a conceptual question rather than a homework problem. Please let me know the reason of your down-vote.

IMPORTANT: I have already read several thread (for example this) about the distribution of threads and blocks but I have a specific query.

I have to process an image data in unsigned char form at GPU. My image is of size (1080 x 1920) with 3 channels and each pixel is of unsigned char type.

GPU Details:

NVIDIA Quadro k2000
2 GB of GDDR5 GPU memory
384 5MX CUDA parallel processing cores

As, I am new to GPU processing, I am not able to understand much about the number of threads per block and total number of block distribution for my GPU card in this specific case.

PROBLEM: When I use the following configuration for my (1080 x 1920) image to call the GPU kernel then, I am getting the desired results but the computational time is too much

dim3 numOfBlocks( (108) , (192) ); 
dim3 numOfThreadsPerBlocks( 3*10 , 3*10 ); //multiplied by 3 because we have 3 channel image now

colorTransformation_kernel<<<numOfBlocks, numOfThreadsPerBlocks>>>(numChannels, step_size, iw, ih, dev_ptr_source, dev_ptr_dst);

but, if I choose to have the following another configuration

    dim3 numOfBlocks( (108/2) , (192/2) ); 
    dim3 numOfThreadsPerBlocks( 3*10*2 , 3*10*2 ); //multiplied by 3 because we have 3 channel image now

then, I get a blank image.

Community
  • 1
  • 1
skm
  • 5,015
  • 8
  • 43
  • 104
  • do you do [error checking](http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api/14038590#14038590)? please post a [Minimal, Complete, and Verifiable example](http://stackoverflow.com/help/mcve) – m.s. Jun 30 '15 at 11:11
  • @m.s. : there is no error. Please tell me, what could be the minimal working example?? Isn't it a logical question?? I have provided all the details about my image, GPU, numBlocks, what else is required. Kindly let me know what other information is required, I will post it. – skm Jun 30 '15 at 11:14
  • how do you know there is no error? If you get a blank image, your kernel might not have been run, which is something you could detect by checking CUDA API errors. A MCVE typically consists of code which can be copied, pasted and compiled without any necessary changes to reproduce the problem. – m.s. Jun 30 '15 at 11:16
  • @m.s.: I wanted to keep it as a logical question rather than a homework type problem. Moreover, I am taking images from electron-microscopy camera then, you will need my image too. Hardware is also different. I can't post all the things at SO. – skm Jun 30 '15 at 11:20
  • @skm... [This](http://stackoverflow.com/a/17124599/1231073) may be of interest. – sgarizvi Jun 30 '15 at 11:43

1 Answers1

4

If you applied error checking as I already suggested in the comments, the output would be:

invalid configuration argument

You are using a Quadro K2000, which has Compute capability 3.0. Compute capability 3.0 allows to have a maximum of 1024 threads per block.

You are using 3*10*2 * 3*10*2 = 3600 threads per block, which is above the 1024 limit. So your kernel didn't even run, which is why you are getting a blank image.

Community
  • 1
  • 1
m.s.
  • 16,063
  • 7
  • 53
  • 88
  • thanks a lot. `error checking` has helped a lot. I found that kernel did not even load as you mentioned. I was confused about the number of threads allowed at my GPU and I could not get the information about it. – skm Jun 30 '15 at 11:40
  • Could you suggest me, what could be the possible optimized configuration for that. Or, I just need to make a configuration which can minimize the number of blocks and maximize the maximum number of threads per block ? – skm Jun 30 '15 at 11:41
  • @skm have a look at the various SO question dealing with this issue, e.g.: http://stackoverflow.com/questions/11592450/how-to-adjust-the-cuda-number-of-block-and-of-thread-to-get-optimal-performances, http://stackoverflow.com/questions/9985912/how-do-i-choose-grid-and-block-dimensions-for-cuda-kernels – m.s. Jun 30 '15 at 11:44