2

At the moment I am trying to optimize some cuda kernels...

If compile with the option --ptxas-options=-v I get the information about registers %co.

In my case I always get some extra lines, which make no sense for me:

ptxas : info : Compiling entry function '_Z20backprojLinTexInterpP7double3S0_S0_P7double2iiiiiS2_PdPf' for 'sm_20'
ptxas : info : Function properties for _Z20backprojLinTexInterpP7double3S0_S0_P7double2iiiiiS2_PdPf
8 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas : info : Used 47 registers, 32 bytes smem, 112 bytes cmem[0], 56 bytes cmem[16]
ptxas : info : Function properties for __internal_trig_reduction_slowpathd
40 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads

the lines 1 to 4 are clear to me, but what are the last to lines?

Google does not help here....i already tried.

Has anybody some idea what the meaning of those lines is? I get them for every compiled kernel in my program

raspiede
  • 179
  • 1
  • 1
  • 11
  • 2
    __internal_trig_reduction_slowpathd() is an internal subroutine in the CUDA math library. It is used to perform accurate argument reduction for double-precision trig functions (sin, cos, sincos, tan) when the argument is very large in magnitude. A Payne-Hanek style argument reduction is used for these large arguments. For sm_20 and up, this is a called subroutine to minimize code size in apps that invoke trig functions frequently. You can see the code by looking at the file math_functions_dbl_ptx3.h which is in the CUDA include file directory. – njuffa Jun 05 '13 at 17:28
  • @njuffa: That should probably be an answer, not a comment. – talonmies Jun 06 '13 at 01:23
  • @talonmies: Thanks for the endorsement, but I am still waiting to hear from the original poster whether what I wrote above actually addresses his question. My comment answered the question "What is _internal_trid_reduction_slowpathd ?" which is not the literal question that was asked, but I thought was the question that was implied. I prefer this "comment first" approach instead of getting dinged later with a "this should have been posted as a comment, not an answer" :-) – njuffa Jun 06 '13 at 02:22
  • yes, that seem to be right. I use the sincos functions in my kernel...Interesting is also, when compiling my program in linux I don´t get these lines. Also the ammount of registers is much higher... – raspiede Jun 07 '13 at 06:50
  • These kernel statistics are only produced when -Xptxas -v (or an equivalent long form) is passed to nvcc. Check your build system to make sure the same flags to nvcc on all platforms. The number of registers can depend on various compile flag settings, and whether your target platform is 32-bit or 64-bit. Since pointers have the same size of host and device, code built for a 64-bit platform often requires use of additional 32-bit registers since each pointer requires two registers for storage, instead of one register on a 32-bit platform. Similarly for `long` on non-Windows systems. – njuffa Jun 10 '13 at 06:19

1 Answers1

5

__internal_trig_reduction_slowpathd() is an internal subroutine in the CUDA math library. It is used to perform accurate argument reduction for double-precision trig functions (sin, cos, sincos, tan) when the argument is very large in magnitude. A Payne-Hanek style argument reduction is used for these large arguments. For sm_20 and up, this is a called subroutine to minimize code size in apps that invoke trig functions frequently. You can see the code by looking at the file math_functions_dbl_ptx3.h which is in the CUDA include file directory.

njuffa
  • 23,970
  • 4
  • 78
  • 130