Compiling CUDA program for a GeForce 310 (compute capability 1.2) with unmatched options "-arch=compute_20 -code=sm_20"

Question

I'm compiling a CUDA program using nvcc with options -arch=20 -code=20 for a GeForce 310 GPU having compute capability 1.2. The program seems to run normally as follows.

wangli@wangli-desktop:~/wangliC2050/1D-EncodeV6.1$ make
nvcc -O --ptxas-options=-v 1D-EncodeV6.1.cu -o 1D-EncodeV6.1 -I../../NVIDIA_GPU_Computing_SDK/C/common/inc -I../../NVIDIA_GPU_Computing_SDK/shared/inc  -arch=compute_20 -code=sm_20 
ptxas info    : Compiling entry function '_Z6EncodePhPjS0_S_S_' for 'sm_20'
ptxas info    : Function properties for _Z6EncodePhPjS0_S_S_
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 14 registers, 52 bytes cmem[0]
wangli@wangli-desktop:~/wangliC2050/1D-EncodeV6.1$ ./1D-EncodeV6.1 
########################### Encoding start (loopCount=10)#######################
#p  n   size    averageTime(s)  averageThroughput(MB/s) errorRate(0~1)
#================= Encode on GPU v6.1 ===============
4   4   4   0.000294    0.051837    100.000000
#################### Encoding stop #########################

So, I wonder:

Why this program could run on a GeForce 310 with nvcc options -arch=compute_20 -code=sm_20 which do not match the compute capability 1.2 of the card?
What will happen if the value of the -arch option will differ from that of the -code option?

Thanks.

score 3 · Accepted Answer · answered Mar 30 '13 at 04:09

A CUDA executable typically contains 2 types of program data: SASS code which is basically GPU machine code, and PTX which is an intermediate code (although it's pretty close to machine code). As long as PTX code is present in the executable, then if the driver decides that a proper SASS binary is not available for the GPU that the code will actually run on, it will do a "JIT-compile" step at application launch, to create the necessary binary code appropriate for the device in question, using the PTX code in the application package.

This is what is happening in your case.

If arch != code, then you're creating device code that architecturally conforms to the arch type, but is compiled to use machine level instructions that are associated with the code type. For example, if I compile for arch = 1.2 and code = 2.0, I cannot use double types (they will be demoted to float, because double is not supported in a 1.2 architecture) but the SASS machine code generated will be ready to execute on a cc 2.0 device, and will not require a JIT-compile step for that kind of device.

The NVCC manual has more information particularly the section on steering code generation.

Your answer and further reading are so clear that I'm suddenly enlightened!Thanks for your warm heart and yours help would be the biggest encouragement for me. — liwang, Mar 30 '13 at 07:49
Actually, I don't think I addressed this directly in the answer. A kernel compiled for `-arch=sm_20` should not run on a cc 1.2 device, it should throw an error on the kernel launch (you have to [properly trap this type of error](http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api)). But if a kernel is compiled for one device, and it successfully runs on another device, it's due to the JIT-compile mechanism. — Robert Crovella, Apr 03 '13 at 13:56

Compiling CUDA program for a GeForce 310 (compute capability 1.2) with unmatched options "-arch=compute_20 -code=sm_20"

1 Answers1