I keep getting an "invalid device function" on my kernel launch. Google turns up a plethora of instances for this, however all of them seem to be related to a mismatch of the embedded SASS/PTX code embedded in the binary.
The way I understand how it works is:
- SASS code can only be interpreted by an GPU with the exact same SM version 2
- PTX code is forward-compatible, i.e. any newer GPU will be able to run the code (however, driver needs to JIT) 2
- I need to specify what I want to target by passing suitable -arch commands to
nvcc
:-gencode arch=compute_30,code=sm_30
will create a SASS targeting SM 3.0,-gencode arch=compute_60,code=compute_60
will create PTX code 1 - To use cuda with static and shared libraries, I need to compile for position-independent code and enable separable compilation
What I did now is:
- Confirmed that I have SM 6.1 for my Titan Xp 5
Forced nvcc to generate compatible code 3
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -gencode arch=compute_61,code=sm_61 -gencode arch=compute_61,code=compute_61 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_30,code=compute_30")
confirmed this gets compiled into my object file with
cuobjdump
:./cuobjdump /mnt/linuxdata/campvis-nx/build/bin/libcuda-interop-cuda.a member /mnt/linuxdata/campvis-nx/build/bin/libcuda-interop-cuda.a:test.cu.o: Fatbin ptx code: ================ arch = sm_61 code version = [6,4] producer = <unknown> host = linux compile_size = 64bit compressed ptxasOptions = --compile-only Fatbin elf code: ================ arch = sm_61 code version = [1,7] producer = <unknown> host = linux compile_size = 64bit compressed Fatbin ptx code: ================ arch = sm_30 code version = [6,4] producer = <unknown> host = linux compile_size = 64bit compressed ptxasOptions = --compile-only Fatbin elf code: ================ arch = sm_30 code version = [1,7] producer = <unknown> host = linux compile_size = 64bit compressed member /mnt/linuxdata/campvis-nx/build/bin/libcuda-interop-cuda.a:mocs_compilation.cpp.o:
realized that only parts of it (the SASS part?) are linked into my shared library (why??):
./cuobjdump /mnt/linuxdata/campvis-nx/build/bin/libcampvis-modules.so Fatbin elf code: ================ arch = sm_61 code version = [1,7] producer = <unknown> host = linux compile_size = 64bit Fatbin elf code: ================ arch = sm_30 code version = [1,7] producer = <unknown> host = linux compile_size = 64bit
I even tried compiling all SM versions from here into the same binary, still with the same result.
It seems that according to this example, embedding PTX is more work than just enabling the compilation of it with CMake, so for now I would be happy with a SASS version..
Did I misunderstand any of the information above?
Are there other possible reasons for an "invalid device function" error?
I can post the code if it helps but I feel this is more of a build system problem..