Questions tagged [nvrtc]

NVIDIA's run-time compilation library for CUDA source code, which produces PTX intermediate-language code

The CUDA platform supports run-time compilation (similar to that of OpenCL): Your application binary can load program source code from a file (or generate it dynamically) and compile it into the PTX intermediate format. This can then be linked into gpu-executable binary code using the CUDA driver API.

A more in-depth description and complete examples can be found in the nVIDIA Documentation for NVRTC.

20 questions

votes

2 answers

Including C standard headers in CUDA NVRTC code

I'm writing a CUDA kernel that is compiled at runtime using NVRTC (CUDA version 9.2 with NVRTC version 7.5), which needs the stdint.h header, in order to have the int32_t etc. types. If I write the kernel source code without the include, it works…

asked May 28 '18 at 11:19

tmlen

8,533
5
31
84

votes

2 answers

How do you include standard CUDA libraries to link with NVRTC code?

Specifically, my issue is that I have CUDA code that needs to run. This isn't included by default in NVRTC. Presumably then when creating the program context (i.e. the call to nvrtcCreateProgram), I have to send in the name of the…

c cuda gpu nvrtc

asked Oct 17 '16 at 13:13

Billy Smith

votes

1 answer

Differences between NVCC and NVRTC on compilation to PTX

Summary I'm porting a simple raytracing application based on the Scratchapixel version to a bunch of GPU libraries. I sucessfully ported it to CUDA using the runtime API and the driver API, but It throws a Segmentation fault (core dumped) when I try…

c++ cuda ptx nvrtc

asked Apr 01 '20 at 03:37

Dinei

4,494
4
36
60

votes

1 answer

Why isn't NVRTC optimizing out my integer division and modulo operations?

I compiled a kernel in NVRTC: __global__ void kernel_A(/* args */) { unsigned short idx = threadIdx.x; unsigned char warp_id = idx / 32; unsigned char lane_id = idx % 32; /* ... */ } I know integer division and modulo are very…

optimization cuda compiler-flags nvrtc

asked Jun 01 '17 at 06:44

Kh40tiK

2,276
19
29

votes

0 answers

What does --no-source-include do in NVRTC?

NVIDIA's run-time CUDA compilation library, NVRTC, supports the compilation flag --no-source-include. The documentation describes this flag as follows: The preprocessor by default adds the directory of each input sources to the include path. This…

c++ cuda jit compiler-options nvrtc

asked Aug 26 '22 at 20:38

einpoklum

118,144
57
340
684

vote

1 answer

When should NVRTC compilation produce a CUBIN?

If I understand the workflow description in the NVRTC documentation correctly, here's how it works: Create an NVRTC program from the source text. Compile the NVRTC program to get PTX code. Device-link the PTX code using NVIDIA's Driver API…

cuda linker ptx nvrtc cubin

asked Oct 30 '21 at 21:17

einpoklum

118,144
57
340
684

vote

2 answers

How can I associate my NVRTC program source with a file?

I'm using NVRTC to compile a kernel. The relevant API call is: nvrtcResult nvrtcCreateProgram ( nvrtcProgram* prog, const char* src, const char* name, int numHeaders, const char** headers, const char** includeNames ) As…

compilation cuda debug-information nvrtc nsight-compute

asked Dec 31 '20 at 09:48

einpoklum

118,144
57
340
684

vote

2 answers

Can I obtain what's used as __nv_nvrtc_builtin_header.h?

I'm profiling a kernel compiled (with debug and lineinfo) using the nvrtc library. In the profiling results, many of the samples are listed as being within __nv_nvrtc_builtin_header.h. However - there is obviously no such file on disk, and naturally…

cuda jit nvrtc cuda-driver

asked Sep 01 '20 at 10:43

einpoklum

118,144
57
340
684

vote

2 answers

How to properly use include stdio.h in an NVRTC-compiled program?

I have written an amazing kernel which will bring me fame and fortune - if I can only get it to compile with NVRTC: #include __global__ void do_stuff() { } I would have hoped that system headers should be recognized by the (runtime)…

c cuda runtime-compilation nvrtc

asked Jul 12 '20 at 11:42

einpoklum

118,144
57
340
684

vote

1 answer

How to properly pass arguments as structs to NVRTC?

let prog = """//Kernel code: extern "C" { #pragma pack(1) typedef struct { int length; float *pointer; } global_array_float; __global__ void kernel_main(global_array_float x){ printf("(on device)…

struct f# cuda nvrtc

asked Nov 04 '16 at 13:19

Marko Grdinić

3,798
3
18
21

votes

1 answer

How to use the option CU_JIT_LTO with CUDA JIT linking?

I'm wondering if I can improve the link time optimization (LTO) during just-in-time (JIT) linking with the option CU_JIT_LTO. If so, how do I specify this option? I found the following code in an NVIDIA developer blog, but I don't understand why…

cuda jit lto nvrtc

asked Dec 26 '21 at 09:04

thomasbangels

votes

1 answer

How can I get the CUDA driver module handle for functions and globals in the compiled program?

The CUDA Runtime API has the functions cudaGetSymbolAddress() and cudaGetSymbolSize() for working with device-side globals from host-side code, using their names (source-code identifiers) as handles. In the Driver API, we have cuModuleGetGlobal(),…

cuda global nvrtc cuda-driver

asked Dec 06 '21 at 21:53

einpoklum

118,144
57
340
684

votes

1 answer

Runtime exception in optix 6.0.0 sample code

After downloading optix6.0.0 sdk and compiling the sample projects included, I encountered runtime exception when running any of the sample projects(e.g. optixHello, optixWhitted), with error message: OptiX Error: 'NVRTC Compilation failed. …

c++ optix nvrtc

asked Jun 12 '20 at 00:39

IcyChlorine

votes

1 answer

Using CUDA-gdb with NVRTC

I have an application which generates CUDA C++ source code, compiles it into PTX at runtime using NVRTC, and then creates CUDA modules from it using the CUDA driver API. If I debug this application using cuda-gdb, it displays the kernel (where an…

cuda gdb cuda-gdb nvrtc

asked Feb 13 '19 at 12:39

tmlen

8,533
5
31
84

votes

1 answer

device function pointer as template parameter

I have a template struct for some reasons (beyond the scope of this question) : template struct func { __device__ inline retV invoke(T i) { funcptr(i); } }; which can be used this way: __device__…

c++ templates cuda nvrtc

asked Nov 09 '18 at 12:33

Regis Portalez

4,675
1
29
41

2 Next