Questions tagged [cuda-driver]

A lower-level C-language API for managing computational work in the CUDA platform on NVIDIA GPU hardware.

This tag refers to the CUDA Driver API. It is a lower-level alternative to the much more common CUDA Runtime API. Both are part of the CUDA platform and offer different levels of abstraction when programming general-purpose GPU applications.

The Driver API resembles much of the OpenCL programming style. Unlike the Runtime API, it does not require the use of the nvcc compiler and offers the possibility of runtime compilation by means of the NVRTC library.

Members of the CUDA Driver API are prefixed with cu, while members of the Runtime API are prefixed with cuda. E.g.: cudaGetErrorName (Runtime API) vs cuGetErrorName (Driver API).

NVIDIA's documentation on the difference between the driver and runtime APIs.

Questions about CUDA Driver API can be asked here on Stack Overflow, but if you have bugs to report you should discuss them on the CUDA forums or report them via the registered developer portal. You may want to cross-link to any discussion here on SO.

46 questions
29
votes
1 answer

Which Compute Capability is supported by which CUDA versions?

What are compute capabilities supported by each of: CUDA 5.5? CUDA 6.0? CUDA 6.5?
Max
  • 441
  • 2
  • 7
  • 14
5
votes
1 answer

What does cudaSetDevice() do to a CUDA device's context stack?

Suppose I have an active CUDA context associated with device i, and I now call cudaSetDevice(i). What happens? : Nothing? Primary context replaces the top of the stack? Primary context is pushed onto the stack? It actually seems to be…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
5
votes
5 answers

ImportError: libcuda.so.1: cannot open shared object file

When I run my code with TensorFlow directly, everything is normal. However, when I run it in a screen window, I get the following error. ImportError: libcuda.so.1: cannot open shared object file: No such file or directory I have tried the…
dwqy11
  • 125
  • 1
  • 1
  • 8
5
votes
1 answer

How can I reset the CUDA error to success with Driver API after a trap instruction?

I have a kernel, which might call asm("trap;") inside kernel. But when that happens, the CUDA error code is set to launch fail, and I cannot reset it. In CUDA Runtime API, we can use cudaGetLastError to get the last error and in the mean time, reset…
Xiang Zhang
  • 2,831
  • 20
  • 40
4
votes
1 answer

Are CUDA_VERSION and CUDART_VERSION necessarily the same?

The CUDA Driver API defines CUDA_VERSION (in cuda.h), and the CUDA Runtime API defines CUDART_VERSION (in cuda_runtime_api.h). However - CUDART_VERSION is not defined as CUDA_VERSION but directly as a number. Are they always supposed to have the…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
2
votes
1 answer

cuMemAlloc'ing memory in one CUDA context, and freeing it in another - why does this succeed?

I create 2 cuda context “ctx1” and "ctx2" and set current context to "ctx1" and allocate 8 bytes of memory and switch current context to ctx2. Then free Memory alloc in ctx1. Why does this return CUDA_SUCCESS? And when I destroy ctx1 and then free…
DplusT
  • 21
  • 1
2
votes
1 answer

How do I obtain the _actual_ CUDA driver version?

How do I programmatically get the actual CUDA driver version (e.g. 470.57.02, not 11.4 like the corresponding CUDA version nor 11040)? We know that it's not cudaDriverGetVersion()...
einpoklum
  • 118,144
  • 57
  • 340
  • 684
2
votes
1 answer

Does the CUDA JIT compiler perform device link-time optimization?

Before device link-time optimization (DLTO) was introduced in CUDA 11.2, it was relatively easy to ensure forward compatibility without worrying too much about differences in performance. You would typically just create a fatbinary containing PTX…
Peet Whittaker
  • 750
  • 6
  • 31
2
votes
1 answer

How can I determine whether a CUDA context is the primary one - cheaply?

You can (?) determine whether a CUDA context is the primary one by calling cuDevicePrimaryCtxRetain() and comparing the returned pointer to the context you have. But - what if nobody's created the primary context yet? Is there a cheaper way to…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
2
votes
1 answer

Is it possible to run cuMemset on a CUarray?

I have a CUarray that I got from my OpenGL-context via cuGraphicsSubResourceGetMappedArray(). Is there a possiblity to use it with cuMemset*()?
morph
  • 305
  • 2
  • 12
1
vote
0 answers

Backup installed packages and restore them later install those on another Ubuntu system without internet connection

Take backup of all installed packages files and restore those files in new server and install all the packages without internet. I had tried with apt-clone method but it's taking backup and restoring the .ZIP file but it is asking for internet.I…
1
vote
0 answers

cuGetExportTable explanation

I am searching for information about cuGetExportTable or cudaGetExportTable. In particular, I want to know what functions are included in the table. A previous post (cudaGetExportTable a total hack) mentions that I concluded that the export table…
MANOS
  • 21
  • 8
1
vote
0 answers

Does a CUDA stream "become active" after execution of a scheduled host function concludes?

The CUDA documentation for scheduling the launching a host function (cuLaunchHostFunc) says: Completion of the function does not cause a stream to become active except as described above. I couldn't quite figure out what's "described above". As…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
1
vote
1 answer

What's the replacement for cuModuleGetSurfRef and cuModuleGetTexRef?

CUDA 12 indicates that these two functions: CUresult cuModuleGetSurfRef (CUsurfref* pSurfRef, CUmodule hmod, const char* name); CUresult cuModuleGetTexRef (CUtexref* pTexRef, CUmodule hmod, const char* name); which obtain a reference to surface or…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
1
vote
1 answer

CUDA H.265 decoder initialization fault

I'm trying to decode h.265 frame with nvidia_video_codec_sdk, video size is 192x168, but cuvidCreateDecoder asserts CUDA_ERROR_INVALID_VALUE. int NvDecoder::HandleVideoSequence(CUVIDEOFORMAT* pVideoFormat) { int nDecodeSurface =…
375
  • 57
  • 6
1
2 3 4