0

I know CUDA (not bad), but I do not know PTX, so my questions are:

  • Is learning ptx code help improving the performance of gpu (CUDA) code?
  • If yes, is there a way to write a ptx code that can be combined with CUDA code to enhance performance?
Sullivan Risk
  • 319
  • 1
  • 4
  • 21

1 Answers1

4

From my personal experiences, PTX helps in debugging/inspecting a non-trivial problem. I have done this only once, however. Also, remember that PTX is only the immediate code generated by the compiler, not the actual assembly language being executed on the GPU.

If you really want to look at machine code, which is assembled after PTX, NVIDA provides cuobjdump. I think PTX has a lot of useful information and good documentation, so learning it would help. However, the general optimization strategies for CUDA include:

  • Minimize memory transactions, particularly data transfer between device/host
  • Coalesce global memory access
  • Increase device utilization via kernel configuration
  • Avoid warp divergence

For your second question, yes you can write PTX in CUDA via inline PTX. I have never done this though.

user3813674
  • 2,553
  • 2
  • 15
  • 26
  • Thanks a lot for your answer. Actually, I read paper(s) that analyze ptx code, and contribute the performance difference between different code to the ptx rearranged instructions by the compiler. I thought that it may be a good idea to do such changes ourselves to improve performance. I do not know if this is a regular thing that people do, or I should go to the machine code directly, which I think it will be more difficult. – Sullivan Risk Mar 29 '16 at 21:08
  • 4
    @SullivanRisk Analyzing PTX code is almost never useful, since it is merely an intermediate representation that is translated to machine code by an optimizing compiler component `ptxas`. Analyzing machine code (SASS) on the other hand is useful in the same way that analyzing the assembly code produced by any tool chain is useful. Use of inline PTX (just as use of inline assembly language on CPU) can be useful to enhance the performance of computations that aren't easily expressed efficiently in a high-level language (e.g. [this answer](http://stackoverflow.com/a/6220499/780717)). – njuffa Mar 29 '16 at 23:05