0

I can't still fully understand cuda's compute capability when compiling the source code.

Assuming the binary files are compiled by using the flags from (code=sm_30, compute=30) to (code=sm_62, compute=62) (nvcc version is 10.1),

what happens when the Turing device (e.g., RTX2080Ti) runs these binary files?

Even though binary files do not include code=sm_75, compute=75 for the Turing architecture, why do they run correctly on the Turing device?

Does the Turing device JIT compile the PTX code of compute=62 (because compute=75 is not mentioned) and generate Turing's SASS (code=sm_75) instead of 65's SASS on runtime?

sungjun cho
  • 809
  • 7
  • 18
  • https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#just-in-time-compilation – talonmies Aug 11 '20 at 10:15
  • This isn't how the flags are specified: **code=sm_30, compute=30**. The flags look like this: `arch=compute_30,code=sm_30`. For that syntax, none of the flag arrangements you have given specify the generation of PTX, and a Turing device cannot run with the SASS code from any sm_30 through sm_62. So I think your question is lacking important details and accuracy/clarity in terms of the way the code is actually being compiled. PTX gets specified when you see `arch=compute_30,code=compute_30` (for example). Any PTX of a numerical arch lower than compute_75 can be JIT compiled to sm_75 SASS. – Robert Crovella Aug 11 '20 at 14:07
  • @RobertCrovella In summary, what I meant was that even though only `arch=compute_30,code=compute_30` is specified, how the Turing device can launch the application (Because we do not specify `arch=compute_75,code=compute_75`, the Turing device shouldn't launch the app) – sungjun cho Aug 11 '20 at 14:22
  • `arch=compute_30,code=compute_30` tells `nvcc` to embed cc3.0 PTX in the binary. PTX can be forward JIT-compiled by the GPU driver (doesn't require the CUDA toolkit) **to any future architecture supported by that GPU driver**. If you have a GPU driver compatible with CUDA 10.1, that driver can support Turing. When you attempt to run the app, the driver looks at the binary package, and observes that no suitable SASS exists. It then discovers that a suitable PTX exists, and uses that PTX to create SASS code that runs on a Turing device. – Robert Crovella Aug 11 '20 at 14:26
  • This is the standard "forward compatibility" mechanism that has been part of CUDA "forever" and there are various questions here on the `cuda` tag discussing it, such as [this one](https://stackoverflow.com/questions/35656294/cuda-how-to-use-arch-and-code-and-sm-vs-compute/35657430#35657430) and the ones it links to. Your question is arguably a duplicate of that one. – Robert Crovella Aug 11 '20 at 14:28
  • @RobertCrovella Thank you. Now it makes sense. I was not sure that whether GPU driver will generate target device's SASS from old PTX. In above case, the GPU driver will generate Turing's SASS from cc3.0 PTX, which might be inefficient due to old PTX. – sungjun cho Aug 11 '20 at 14:48

0 Answers0