Questions tagged [cuda]

CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model for NVIDIA GPUs (Graphics Processing Units). CUDA provides an interface to NVIDIA GPUs through a variety of programming languages, libraries, and APIs.

CUDA is Nvidia's parallel computing platform and programming model for GPUs (Graphics Processing Units). CUDA provides an interface to Nvidia GPUs through a variety of programming languages, libraries, and APIs. Before posting CUDA questions, please read "How to get useful answers to your CUDA questions" below.

CUDA has an online documentation repository, updated with each release, including references for APIs and libraries; user guides for applications; and a detailed CUDA C/C++ Programming Guide.

The CUDA platform enables application development using several languages and associated APIs, including:

There also exist third-party bindings for using CUDA in other languages and programming environments, such as Managed CUDA for .NET languages (including C#).

You should ask questions about CUDA here on Stack Overflow, but if you have bugs to report you should discuss them on the CUDA forums or report them via the registered developer portal. You may want to cross-link to any discussion here on SO.

The CUDA execution model is not multithreading in the usual sense, so please do not tag CUDA questions with unless your question involves thread safety of the CUDA APIs, or the use of both normal CPU multithreading and CUDA together.

How to get useful answers to your CUDA questions

Here are a number of suggestions to users new to CUDA. Follow these suggestions before asking your question and you are much more likely to get a satisfactory answer!

  • Always check the result codes returned by CUDA API functions to ensure you are getting cudaSuccess. If you are not, and you don't know why, include the information about the error in your question. This includes checking for errors caused by the most recent kernel launch, which may not be available before you've called cudaDeviceSynchronize() or cudaStreamSynchronize(). More on checking for errors in CUDA in this question.
  • If you are getting unspecified launch failure it is possible that your code is causing a segmentation fault, meaning the code is accessing memory that is not allocated for the code to use. Try to verify that the indexing is correct and check if the CUDA Compute Sanitizer (or legacy cuda-memcheck on older GPUs until CUDA 12) is reporting any errors. Note that both tools encompass more than the default Memcheck. Other tools (Racecheck, Initcheck, Synccheck) must be selected explicitly.
  • The debugger for CUDA, , is also very useful when you are not really sure what you are doing. You can monitor resources by warp, thread, block, SM and grid level. You can follow your program's execution. If a segmentation fault occurs in your program, can help you find where the crash occurred and see what the context is. If you prefer a GUI for debugging, there are IDE plugins/editions for/of Visual Studio (Windows), Visual Studio Code (Windows/Mac/Linux, but GPU for debugging must be on a Linux system) and Eclipse (Linux).
  • If you are finding that you are getting syntax errors on CUDA keywords when compiling device code, make sure you are compiling using nvcc (or clang with CUDA support enabled) and that your source file has the expected .cu extension. If you find that CUDA device functions or feature namespaces you expect to work are not found (atomic functions, warp voting functions, half-precision arithmetic, cooperative groups, etc.), ensure that you are explicitly passing compilation arguments which enable architecture settings which support those features.

Books

14278 questions
889
votes
31 answers

How to get the CUDA version?

Is there any quick command or script to check for the version of CUDA installed? I found the manual of 4.0 under the installation directory but I'm not sure whether it is of the actual installed version or not.
Hailiang Zhang
  • 17,604
  • 23
  • 71
  • 117
518
votes
19 answers

Nvidia NVML Driver/library version mismatch

When I run nvidia-smi, I get the following message: Failed to initialize NVML: Driver/library version mismatch An hour ago I received the same message and uninstalled my CUDA library and I was able to run nvidia-smi, getting the following…
etal
  • 12,914
  • 4
  • 13
  • 16
321
votes
7 answers

Which TensorFlow and CUDA version combinations are compatible?

I have noticed that some newer TensorFlow versions are incompatible with older CUDA and cuDNN versions. Does an overview of the compatible versions or even a list of officially tested combinations exist? I can't find it in the TensorFlow…
whiletrue
  • 10,500
  • 6
  • 27
  • 47
299
votes
5 answers

What is the canonical way to check for errors using the CUDA runtime API?

Looking through the answers and comments on CUDA questions, and in the CUDA tag wiki, I see it is often suggested that the return status of every API call should checked for errors. The API documentation contains functions like cudaGetLastError,…
talonmies
  • 70,661
  • 34
  • 192
  • 269
281
votes
6 answers

Different CUDA versions shown by nvcc and NVIDIA-smi

I am very confused by the different CUDA versions shown by running which nvcc and nvidia-smi. I have both cuda9.2 and cuda10 installed on my ubuntu 16.04. Now I set the PATH to point to cuda9.2. So when I run $ which…
yuqli
  • 4,461
  • 8
  • 26
  • 46
248
votes
13 answers

How to verify CuDNN installation?

I have searched many places but ALL I get is HOW to install it, not how to verify that it is installed. I can verify my NVIDIA driver is installed, and that CUDA is installed, but I don't know how to verify CuDNN is installed. Help will be much…
alfredox
  • 4,082
  • 6
  • 21
  • 29
243
votes
17 answers

A top-like utility for monitoring CUDA activity on a GPU

I'm trying to monitor a process that uses CUDA and MPI, is there any way I could do this, something like the command "top" but that monitors the GPU too?
natorro
  • 2,793
  • 3
  • 19
  • 16
236
votes
10 answers

Using GPU from a docker container?

I'm searching for a way to use the GPU from inside a docker container. The container will execute arbitrary code so i don't want to use the privileged mode. Any tips? From previous research i understood that run -v and/or LXC cgroup was the way to…
Regan
  • 8,231
  • 5
  • 23
  • 23
176
votes
2 answers

How do CUDA blocks/warps/threads map onto CUDA cores?

I have been using CUDA for a few weeks, but I have some doubts about the allocation of blocks/warps/thread. I am studying the architecture from a didactic point of view (university project), so reaching peak performance is not my concern. First of…
Daedalus
  • 1,761
  • 3
  • 11
  • 3
174
votes
2 answers

Understanding CUDA grid dimensions, block dimensions and threads organization (simple explanation)

How are threads organized to be executed by a GPU?
cibercitizen1
  • 20,944
  • 16
  • 72
  • 95
163
votes
5 answers

Using Java with Nvidia GPUs (CUDA)

I'm working on a business project that is done in Java, and it needs huge computation power to compute business markets. Simple math, but with huge amount of data. We ordered some CUDA GPUs to try it with and since Java is not supported by CUDA, I'm…
Hans
  • 1,846
  • 3
  • 14
  • 19
149
votes
9 answers

Difference between global and device functions

Can anyone describe the differences between __global__ and __device__ ? When should I use __device__, and when to use __global__?.
Mehdi Saman Booy
  • 2,760
  • 5
  • 26
  • 32
148
votes
6 answers

How do I select which GPU to run a job on?

In a multi-GPU computer, how do I designate which GPU a CUDA job should run on? As an example, when installing CUDA, I opted to install the NVIDIA_CUDA-<#.#>_Samples then ran several instances of the nbody simulation, but they all ran on one GPU…
Steven C. Howell
  • 16,902
  • 15
  • 72
  • 97
146
votes
21 answers

CUDA incompatible with my gcc version

I have troubles compiling some of the examples shipped with CUDA SDK. I have installed the developers driver (version 270.41.19) and the CUDA toolkit, then finally the SDK (both the 4.0.17 version). Initially it didn't compile at all giving: error…
fbielejec
  • 3,492
  • 4
  • 27
  • 35
137
votes
3 answers

How do I choose grid and block dimensions for CUDA kernels?

This is a question about how to determine the CUDA grid, block and thread sizes. This is an additional question to the one posted here. Following this link, the answer from talonmies contains a code snippet (see below). I don't understand the…
user1292251
  • 1,655
  • 3
  • 16
  • 16
1
2 3
99 100