Questions tagged [nvrtc]

NVIDIA's run-time compilation library for CUDA source code, which produces PTX intermediate-language code

The CUDA platform supports run-time compilation (similar to that of OpenCL): Your application binary can load program source code from a file (or generate it dynamically) and compile it into the PTX intermediate format. This can then be linked into gpu-executable binary code using the CUDA driver API.

A more in-depth description and complete examples can be found in the nVIDIA Documentation for NVRTC.

20 questions
6
votes
2 answers

Including C standard headers in CUDA NVRTC code

I'm writing a CUDA kernel that is compiled at runtime using NVRTC (CUDA version 9.2 with NVRTC version 7.5), which needs the stdint.h header, in order to have the int32_t etc. types. If I write the kernel source code without the include, it works…
tmlen
  • 8,533
  • 5
  • 31
  • 84
6
votes
2 answers

How do you include standard CUDA libraries to link with NVRTC code?

Specifically, my issue is that I have CUDA code that needs to run. This isn't included by default in NVRTC. Presumably then when creating the program context (i.e. the call to nvrtcCreateProgram), I have to send in the name of the…
Billy Smith
  • 273
  • 1
  • 5
3
votes
1 answer

Differences between NVCC and NVRTC on compilation to PTX

Summary I'm porting a simple raytracing application based on the Scratchapixel version to a bunch of GPU libraries. I sucessfully ported it to CUDA using the runtime API and the driver API, but It throws a Segmentation fault (core dumped) when I try…
Dinei
  • 4,494
  • 4
  • 36
  • 60
3
votes
1 answer

Why isn't NVRTC optimizing out my integer division and modulo operations?

I compiled a kernel in NVRTC: __global__ void kernel_A(/* args */) { unsigned short idx = threadIdx.x; unsigned char warp_id = idx / 32; unsigned char lane_id = idx % 32; /* ... */ } I know integer division and modulo are very…
Kh40tiK
  • 2,276
  • 19
  • 29
2
votes
0 answers

What does --no-source-include do in NVRTC?

NVIDIA's run-time CUDA compilation library, NVRTC, supports the compilation flag --no-source-include. The documentation describes this flag as follows: The preprocessor by default adds the directory of each input sources to the include path. This…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
1
vote
1 answer

When should NVRTC compilation produce a CUBIN?

If I understand the workflow description in the NVRTC documentation correctly, here's how it works: Create an NVRTC program from the source text. Compile the NVRTC program to get PTX code. Device-link the PTX code using NVIDIA's Driver API…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
1
vote
2 answers

How can I associate my NVRTC program source with a file?

I'm using NVRTC to compile a kernel. The relevant API call is: nvrtcResult nvrtcCreateProgram ( nvrtcProgram* prog, const char* src, const char* name, int numHeaders, const char** headers, const char** includeNames ) As…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
1
vote
2 answers

Can I obtain what's used as __nv_nvrtc_builtin_header.h?

I'm profiling a kernel compiled (with debug and lineinfo) using the nvrtc library. In the profiling results, many of the samples are listed as being within __nv_nvrtc_builtin_header.h. However - there is obviously no such file on disk, and naturally…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
1
vote
2 answers

How to properly use include stdio.h in an NVRTC-compiled program?

I have written an amazing kernel which will bring me fame and fortune - if I can only get it to compile with NVRTC: #include __global__ void do_stuff() { } I would have hoped that system headers should be recognized by the (runtime)…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
1
vote
1 answer

How to properly pass arguments as structs to NVRTC?

let prog = """//Kernel code: extern "C" { #pragma pack(1) typedef struct { int length; float *pointer; } global_array_float; __global__ void kernel_main(global_array_float x){ printf("(on device)…
Marko Grdinić
  • 3,798
  • 3
  • 18
  • 21
0
votes
1 answer

How to use the option CU_JIT_LTO with CUDA JIT linking?

I'm wondering if I can improve the link time optimization (LTO) during just-in-time (JIT) linking with the option CU_JIT_LTO. If so, how do I specify this option? I found the following code in an NVIDIA developer blog, but I don't understand why…
0
votes
1 answer

How can I get the CUDA driver module handle for functions and globals in the compiled program?

The CUDA Runtime API has the functions cudaGetSymbolAddress() and cudaGetSymbolSize() for working with device-side globals from host-side code, using their names (source-code identifiers) as handles. In the Driver API, we have cuModuleGetGlobal(),…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
0
votes
1 answer

Runtime exception in optix 6.0.0 sample code

After downloading optix6.0.0 sdk and compiling the sample projects included, I encountered runtime exception when running any of the sample projects(e.g. optixHello, optixWhitted), with error message: OptiX Error: 'NVRTC Compilation failed. …
0
votes
1 answer

Using CUDA-gdb with NVRTC

I have an application which generates CUDA C++ source code, compiles it into PTX at runtime using NVRTC, and then creates CUDA modules from it using the CUDA driver API. If I debug this application using cuda-gdb, it displays the kernel (where an…
tmlen
  • 8,533
  • 5
  • 31
  • 84
0
votes
1 answer

device function pointer as template parameter

I have a template struct for some reasons (beyond the scope of this question) : template struct func { __device__ inline retV invoke(T i) { funcptr(i); } }; which can be used this way: __device__…
Regis Portalez
  • 4,675
  • 1
  • 29
  • 41
1
2