If I understand the workflow description in the NVRTC documentation correctly, here's how it works:
- Create an NVRTC program from the source text.
- Compile the NVRTC program to get PTX code.
- Device-link the PTX code using NVIDIA's Driver API (
cuLinkCreate
,cuLinkAddData
,cuLinkComplete
) to get the cubin.
However... beginning with CUDA 11.3, NVRTC has the following API call :
nvrtcResult nvrtcGetCUBIN ( nvrtcProgram prog, char* cubin );
So how can I have a cubin after compilation only?