Can nvlink inline device functions from separate compilation units?

Question

If the separate compilation units that are fed as input to nvlink contain cuda kernels and device functions that invoke device functions marked as __forceinline__, will these functions be inlined? Assume they would be inlined if one put all the source code into a single file.

talonmies · Answer 1 · 2018-07-25T09:27:58.967

If the separate compilation units that are fed as input to nvlink contain cuda kernels and device functions that invoke device functions marked as __forceinline__, will these functions be inlined?

To the best of my knowledge, the CUDA device code linker can't do this. The __forceinline__ directive is a compiler level operation, and after compilation there is no way of marking code as inlineable in either PTX or SASS. The CUDA device code compiler should emit a warning that an external inline function was used but not defined if you try this.

If you want functions to be compiled inline, you have to (unsurprisingly) use a compiler, not a linker.

Can nvlink inline device functions from separate compilation units?

1 Answers1