Decide CUDA threads and blocks for image processing

Question

Dear DOWNVOTERS: kindly let me know the reason of down vote. I have already accepted an answer which means that the person was able to understand the problem and a minimal working example was not required. Secondly, I wanted it to be a conceptual question rather than a homework problem. Please let me know the reason of your down-vote.

IMPORTANT: I have already read several thread (for example this) about the distribution of threads and blocks but I have a specific query.

I have to process an image data in unsigned char form at GPU. My image is of size (1080 x 1920) with 3 channels and each pixel is of unsigned char type.

GPU Details:

NVIDIA Quadro k2000
2 GB of GDDR5 GPU memory
384 5MX CUDA parallel processing cores

As, I am new to GPU processing, I am not able to understand much about the number of threads per block and total number of block distribution for my GPU card in this specific case.

PROBLEM: When I use the following configuration for my (1080 x 1920) image to call the GPU kernel then, I am getting the desired results but the computational time is too much

dim3 numOfBlocks( (108) , (192) ); 
dim3 numOfThreadsPerBlocks( 3*10 , 3*10 ); //multiplied by 3 because we have 3 channel image now

colorTransformation_kernel<<<numOfBlocks, numOfThreadsPerBlocks>>>(numChannels, step_size, iw, ih, dev_ptr_source, dev_ptr_dst);

but, if I choose to have the following another configuration

    dim3 numOfBlocks( (108/2) , (192/2) ); 
    dim3 numOfThreadsPerBlocks( 3*10*2 , 3*10*2 ); //multiplied by 3 because we have 3 channel image now

then, I get a blank image.

do you do [error checking](http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api/14038590#14038590)? please post a [Minimal, Complete, and Verifiable example](http://stackoverflow.com/help/mcve) — m.s., Jun 30 '15 at 11:11
@m.s. : there is no error. Please tell me, what could be the minimal working example?? Isn't it a logical question?? I have provided all the details about my image, GPU, numBlocks, what else is required. Kindly let me know what other information is required, I will post it. — skm, Jun 30 '15 at 11:14
how do you know there is no error? If you get a blank image, your kernel might not have been run, which is something you could detect by checking CUDA API errors. A MCVE typically consists of code which can be copied, pasted and compiled without any necessary changes to reproduce the problem. — m.s., Jun 30 '15 at 11:16
@m.s.: I wanted to keep it as a logical question rather than a homework type problem. Moreover, I am taking images from electron-microscopy camera then, you will need my image too. Hardware is also different. I can't post all the things at SO. — skm, Jun 30 '15 at 11:20
@skm... [This](http://stackoverflow.com/a/17124599/1231073) may be of interest. — sgarizvi, Jun 30 '15 at 11:43

score 4 · Accepted Answer · edited May 23 '17 at 12:07

4

If you applied error checking as I already suggested in the comments, the output would be:

invalid configuration argument

You are using a Quadro K2000, which has Compute capability 3.0. Compute capability 3.0 allows to have a maximum of 1024 threads per block.

You are using 3*10*2 * 3*10*2 = 3600 threads per block, which is above the 1024 limit. So your kernel didn't even run, which is why you are getting a blank image.

edited May 23 '17 at 12:07

Community

1
1

answered Jun 30 '15 at 11:31

m.s.

16,063
7
53
88

thanks a lot. `error checking` has helped a lot. I found that kernel did not even load as you mentioned. I was confused about the number of threads allowed at my GPU and I could not get the information about it. – skm Jun 30 '15 at 11:40
Could you suggest me, what could be the possible optimized configuration for that. Or, I just need to make a configuration which can minimize the number of blocks and maximize the maximum number of threads per block ? – skm Jun 30 '15 at 11:41
@skm have a look at the various SO question dealing with this issue, e.g.: http://stackoverflow.com/questions/11592450/how-to-adjust-the-cuda-number-of-block-and-of-thread-to-get-optimal-performances, http://stackoverflow.com/questions/9985912/how-do-i-choose-grid-and-block-dimensions-for-cuda-kernels – m.s. Jun 30 '15 at 11:44

Decide CUDA threads and blocks for image processing

1 Answers1