Does 'code=sm_X' embed only binary (cubin) code, or also PTX code, or both?

Question

I am little bit confused about the 'code=sm_X' option within the '-gencode' statement.

An example: What does the NVCC compiler option

-gencode arch=compute_13,code=sm_13

embed in the library ?

Only the machine code (cubin code) for GPUs with CC 1.3, or also the PTX code for GPUs with CC 1.3 ?

In the 'Maxwell compatibility guide', it is stated "Only the back-end target versions(s) specified by the 'code=' clause will be retained in the resulting binary".

From that, I would infer that the given compiler option only embeds machine code for GPUs with CC 1.3 and no PTX code. This would mean that it would not be possible to run this library e.g. on aa Maxwell generation card, as there is no PTX code embeded within the library from which the machine code could be 'just-in-time' (JIT) compiled.

On the other side, on the GTC 2013 presentation 'Introduction to the CUDA Toolkit as an Application Build Tool' by NVIDIA it is stated that the '-gencode arch=compute_13,code=sm_13' is enough for all GPUs with CC >= 1.3, and that with this compiler option for GPUs with CC > 1.3 the machine code is JIT-ed from the PTX code. So, the information given in the Maxwell compatibility guide and this GTC presentation is conflicting in my opinion.

score 5 · Accepted Answer · answered Oct 07 '14 at 14:56

nvcc has many formats by which the code generation options can be specified. A read of section 6 of the nvcc manual may be instructive.

when using this format:

nvcc -gencode arch=compute_13,code=sm_13 ...

only the SASS code for a sm_13 (cc 1.3) device will be retained. There will be no PTX retained in the executable object, and so the code can only run on a device capable of running cc1.3 SASS.

Using the above command format, in order to embed a PTX version of the source code into the executable object, it's necessary to use a virtual architecture specification for the option provided to code=.... Since this particular format (using -gencode) does not allow specification of multiple targets in a single switch, we must pass the -gencode switch multiple times to nvcc, one for each target we desire to be embedded in the executable object.

So extending the above example, we could use the following:

nvcc -gencode arch=compute_13,code=sm_13 -gencode arch=compute_13,code=compute_13 ...

This would embed both cc1.3 SASS (by the first gencode switch) and cc1.3 PTX (by the second gencode switch) in the executable. Devices capable of running cc1.3 SASS code directly will use that. Other devices (of compute capability greater than cc 1.3) will do a JIT-compile step by the driver, to convert the cc1.3 PTX code to a SASS code with an architecture suitable for the device in question.

I agree that the GTC 2013 presentation (e.g. slide 37) seems to suggest that

nvcc -gencode arch=compute_13,code=sm_13 ...

is sufficient for all devices of compute capability 1.3 or higher. It is not, and this is easy to demonstrate. If you compile a code using the above format, and attempt to run it on a cc 2.0 device, it will fail with an "invalid device function" error associated with any kernel or kernels you have in your code.

Again, nvcc has a variety of command formats and "shortcuts" for specifying code generation. Some relatively simple ones, such as:

nvcc -arch=sm_13 ...

will embed both a PTX and SASS version of the code in the executable object, resulting in the kind of forward-compatibility suggested.

thx for the clear and concise answer, that resolves all of the question marks in my brain :-) — user2454869, Oct 07 '14 at 15:35
gencode only allows one virtual architecture per switch, but you can specify multiple targets like this: nvcc -gencode=arch=compute_13,code=compute_13,sm_13 — shoelzer, Jan 21 '21 at 14:55

Does 'code=sm_X' embed only binary (cubin) code, or also PTX code, or both?

1 Answers1

Linked