CUDA Kernel memory limit

Asked Sep 10 '17 at 19:58

Active Sep 10 '17 at 20:28

Viewed 189 times

I am getting for a CUDA kernel compiled with ptx - verbose option the following output:

ptxas info : Compiling entry function '_Z19IntersectRaysKernelPdS_S_PcPiS1_yyyyS_' for 'sm_61' ptxas info : Function properties for _Z19IntersectRaysKernelPdS_S_PcPiS1_yyyyS_ 48 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads

ptxas info : Used 112 registers, 64 bytes cumulative stack size, 408 bytes cmem[0], 40 bytes cmem[2]

Can I infer maximal kernel launch parameters (i.e., grid_dim and blk_dim) due to memory consumption? (I am Using GF GTX 1050 Ti)

edited Sep 10 '17 at 20:28

talonmies

70,661
34
192
269

asked Sep 10 '17 at 19:58

Benny K

1,957
18
33

1

https://stackoverflow.com/q/9985912/681865 – talonmies Sep 10 '17 at 20:35
The 112 registers used is likely to be more of a limiting factor for block size than anything else you have shown. For grid size, I think you're unlikely to run into any limits. So as a general rule, **no** you cannot infer maximal launch parameters (block size) based strictly on memory consumption. Your 112 registers would limit you to a [maximum block size](http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#features-and-technical-specifications__technical-specifications-per-compute-capability) of about 585 threads on a cc6.1 device. – Robert Crovella Sep 10 '17 at 21:47
@RobertCrovella: Is it possible that "too much" local variables in my kernel causing access violations due to high memory consumption? – Benny K Sep 11 '17 at 05:23
1

@BennyK: No that is not possible – talonmies Sep 11 '17 at 05:31

CUDA Kernel memory limit

0 Answers0