I just need to clarify something very basic - with most of the computational examples using something like:
ID = blockIdx.x*blockDim.x+threadIdx.x;
// ... then do computation on array[ID]
My question is that if I want to use the maximum number of thread in a block (1024) then do I really need to 'construct' my 'threadID' with consideration of all of (threadIdx.x
, threadIdx.y
, threadIdx.z
) ?
If so, what is a recommended way to hash it into a single value?
If not so, why can someone using it in a similar fashion in image-processing related operations such as in this post:
https://stackoverflow.com/questions/11503406/cuda-addressing-a-matrix
How about blockidx.x
and blockidx.y
, are they in the same shoes as the threaIdx
in this regard?