Why is z always zero in CUDA kernel

Question

I am using Cudafy to do some calculations on a NVIDIA GPU. (Quadro K1100M capability 3.0, if it matters)

My question is, when I use the following

cudaGpu.Launch(new dim3(44,8,num), new dim(8, 8)).MyKernel...

why are my z indexes from the GThread instance always zero when I use this in my kernel?

int z = thread.blockIdx.z * thread.blockDim.z + thread.threadIdx.z;

Furthermore, if I have to do something like

cudaGpu.Launch(new dim3(44,8,num), new dim(8, 8, num)).MyKernel...

z does give different indexes as it should, but num can't be very large because of the restrictions on number of threads per block. Any surgestion on how to work around this?

Edit

Another way to phrase it. Can I use thread.z in my kernel (for anything useful) when block size is only 2D?

0 might be the default value ? You should always provide your "num" with a value at least 1. To deal with the limit of threads per block, just increase your number of blocks (so, more blocks of less threads). — Taro, Apr 25 '16 at 14:50
num was just to indicate it could be anything, but I am positive it is > 0. — smok, Apr 25 '16 at 16:42
I don't know much about cudafy but it might be that the mapping between dot net and cuda, designed before cuda allowed gridDim.z dimension, has not been updated and does not account for z dimension. This requires verification though — Florent DUGUET, Apr 26 '16 at 08:19
After a research I only found this topic in which an user points out that CUDAfy reports a wrong CC for his device : http://cudafy1.rssing.com/chan-12112480/all_p2.html . It might be because of this, it thinks it *cant launch a 3-dimensionnal grid, despite your device can. — Taro, Apr 26 '16 at 08:21

score 5 · Accepted Answer · edited Apr 26 '16 at 10:04

5

On all currently supported hardware, CUDA allows the use of both three dimensional grids and three dimensional blocks. On compute capability 1.x devices (which are no longer supported), grids were restricted to two dimensions.

However, CUDAfy currently uses a deprecated runtime API function to launch kernels, and silently uses only gridDim.x and gridDim.y, not taking gridDim.z in account :

_cuda.Launch(function, gridSize.x, gridSize.y);

As seen in the function DoLaunch() in CudaGPU.cs.

So while you can specify a three dimensional grid in CUDAfy, the third dimension is ignored during the kernel launch. Thanks to Florent for pointing this out !

edited Apr 26 '16 at 10:04

talonmies

70,661
34
192
269

answered Apr 26 '16 at 07:34

Taro

798
8
18

This isn't really correct. Three dimensional grids *are* supported on all hardware with CC>=2.0 (i.e. all hardware with support in CUDA 7) – talonmies Apr 26 '16 at 07:45
In the cheatsheet there are also examples where indexing a 3D grid of blocks, but I never had a Nvidia GPU supporting a 3D-grid of blocks. Nsight always reported I could launch 65535 per 65535 per 1 blocks of various numbers of threads depending on the GPU. – Taro Apr 26 '16 at 07:45
http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capabilities. Grids of 65535 x 65535 x 65535 are supported, or 2^31-1 for 1D grids for CC>=3 – talonmies Apr 26 '16 at 07:45
If you knew it, why not considering into answering the question from smok ? Anyway I'm going to edit mine to rectify. – Taro Apr 26 '16 at 07:47
Because it isn't an answer to the question asked, that's why. The question is about CUDAFy.net. I don't know what additional limitation CUDAFy imposes on execution parameters. It might well be that CUDAFy itself only supoprts 2D grids. But I don't know whether that is the case. And neither do you. Which is why this doesn't answer the question either. – talonmies Apr 26 '16 at 08:00
No I don't, but I can look for it and -maybe- find an answer. If you don't want to answer the question from smok, let the others try at least. – Taro Apr 26 '16 at 08:06
@Taro, I think your answer is the good one. Even though CUDA does *now* allow to have 3D grids, the CUDAfy API may not use it, and rather use the old kernel API cuLaunchGrid http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC__DEPRECATED.html#group__CUDA__EXEC__DEPRECATED_1g39d9904389fa9594622f8b0ec25b4016 . – Florent DUGUET Apr 26 '16 at 09:00
@FlorentDUGUE Maybe, but to be sure we would require some documentation or existing threads talking about it, but I really found almost nothing on this topic. – Taro Apr 26 '16 at 09:13
1

@Taro, sneaking an eye in CUDAfy code, seems that it is using CUDA runtime deprecated API that only allows 2D calls: https://cudafy.codeplex.com/SourceControl/latest#Cudafy/Cudafy.Host/CudaGPU.cs (DoLaunch). Solely making use of gridSize.x and gridSize.y. – Florent DUGUET Apr 26 '16 at 09:35
@FlorentDUGUET That's good to know ! Yes, silently using only gridSize.x and y, not z... Too bad this is not documented nor alerted... Thanks for pointing this out, anyway ! – Taro Apr 26 '16 at 09:49
I added this to my answer, expliciting you were the one to find this and point it out. I think it will help smok and other users who could read this in the future. – Taro Apr 26 '16 at 09:54
I believe this answers my question perfectly. Hopefully Cudafy can correct this, since it's a really nice tool. – smok Apr 29 '16 at 08:27
Glad it helps. This is thanks to @FlorentDUGUET who pointed this out after sneaking into CudaFy sources ;) – Taro Apr 29 '16 at 08:44

Why is z always zero in CUDA kernel

1 Answers1

Linked