I am using Cudafy to do some calculations on a NVIDIA GPU. (Quadro K1100M capability 3.0, if it matters)
My question is, when I use the following
cudaGpu.Launch(new dim3(44,8,num), new dim(8, 8)).MyKernel...
why are my z indexes from the GThread instance always zero when I use this in my kernel?
int z = thread.blockIdx.z * thread.blockDim.z + thread.threadIdx.z;
Furthermore, if I have to do something like
cudaGpu.Launch(new dim3(44,8,num), new dim(8, 8, num)).MyKernel...
z does give different indexes as it should, but num can't be very large because of the restrictions on number of threads per block. Any surgestion on how to work around this?
Edit
Another way to phrase it. Can I use thread.z in my kernel (for anything useful) when block size is only 2D?