0

I am getting for a CUDA kernel compiled with ptx - verbose option the following output:

ptxas info : Compiling entry function '_Z19IntersectRaysKernelPdS_S_PcPiS1_yyyyS_' for 'sm_61' ptxas info : Function properties for _Z19IntersectRaysKernelPdS_S_PcPiS1_yyyyS_ 48 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads

ptxas info : Used 112 registers, 64 bytes cumulative stack size, 408 bytes cmem[0], 40 bytes cmem[2]

Can I infer maximal kernel launch parameters (i.e., grid_dim and blk_dim) due to memory consumption? (I am Using GF GTX 1050 Ti)

talonmies
  • 70,661
  • 34
  • 192
  • 269
Benny K
  • 1,957
  • 18
  • 33
  • 1
    https://stackoverflow.com/q/9985912/681865 – talonmies Sep 10 '17 at 20:35
  • The 112 registers used is likely to be more of a limiting factor for block size than anything else you have shown. For grid size, I think you're unlikely to run into any limits. So as a general rule, **no** you cannot infer maximal launch parameters (block size) based strictly on memory consumption. Your 112 registers would limit you to a [maximum block size](http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#features-and-technical-specifications__technical-specifications-per-compute-capability) of about 585 threads on a cc6.1 device. – Robert Crovella Sep 10 '17 at 21:47
  • @RobertCrovella: Is it possible that "too much" local variables in my kernel causing access violations due to high memory consumption? – Benny K Sep 11 '17 at 05:23
  • 1
    @BennyK: No that is not possible – talonmies Sep 11 '17 at 05:31

0 Answers0