Questions tagged [cuda]

CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model for NVIDIA GPUs (Graphics Processing Units). CUDA provides an interface to NVIDIA GPUs through a variety of programming languages, libraries, and APIs.

CUDA is Nvidia's parallel computing platform and programming model for GPUs (Graphics Processing Units). CUDA provides an interface to Nvidia GPUs through a variety of programming languages, libraries, and APIs. Before posting CUDA questions, please read "How to get useful answers to your CUDA questions" below.

CUDA has an online documentation repository, updated with each release, including references for APIs and libraries; user guides for applications; and a detailed CUDA C/C++ Programming Guide.

The CUDA platform enables application development using several languages and associated APIs, including:

There also exist third-party bindings for using CUDA in other languages and programming environments, such as Managed CUDA for .NET languages (including C#).

You should ask questions about CUDA here on Stack Overflow, but if you have bugs to report you should discuss them on the CUDA forums or report them via the registered developer portal. You may want to cross-link to any discussion here on SO.

The CUDA execution model is not multithreading in the usual sense, so please do not tag CUDA questions with multithreading unless your question involves thread safety of the CUDA APIs, or the use of both normal CPU multithreading and CUDA together.

How to get useful answers to your CUDA questions

Here are a number of suggestions to users new to CUDA. Follow these suggestions before asking your question and you are much more likely to get a satisfactory answer!

Always check the result codes returned by CUDA API functions to ensure you are getting cudaSuccess. If you are not, and you don't know why, include the information about the error in your question. This includes checking for errors caused by the most recent kernel launch, which may not be available before you've called cudaDeviceSynchronize() or cudaStreamSynchronize(). More on checking for errors in CUDA in this question.
If you are getting unspecified launch failure it is possible that your code is causing a segmentation fault, meaning the code is accessing memory that is not allocated for the code to use. Try to verify that the indexing is correct and check if the CUDA Compute Sanitizer (or legacy cuda-memcheck on older GPUs until CUDA 12) is reporting any errors. Note that both tools encompass more than the default Memcheck. Other tools (Racecheck, Initcheck, Synccheck) must be selected explicitly.
The debugger for CUDA, cuda-gdb, is also very useful when you are not really sure what you are doing. You can monitor resources by warp, thread, block, SM and grid level. You can follow your program's execution. If a segmentation fault occurs in your program, cuda-gdb can help you find where the crash occurred and see what the context is. If you prefer a GUI for debugging, there are IDE plugins/editions for/of Visual Studio (Windows), Visual Studio Code (Windows/Mac/Linux, but GPU for debugging must be on a Linux system) and Eclipse (Linux).
If you are finding that you are getting syntax errors on CUDA keywords when compiling device code, make sure you are compiling using nvcc (or clang with CUDA support enabled) and that your source file has the expected .cu extension. If you find that CUDA device functions or feature namespaces you expect to work are not found (atomic functions, warp voting functions, half-precision arithmetic, cooperative groups, etc.), ensure that you are explicitly passing compilation arguments which enable architecture settings which support those features.

Books

14278 questions

889

votes

31 answers

How to get the CUDA version?

Is there any quick command or script to check for the version of CUDA installed? I found the manual of 4.0 under the installation directory but I'm not sure whether it is of the actual installed version or not.

cuda

asked Mar 15 '12 at 20:30

Hailiang Zhang

17,604
23
71
117

518

votes

19 answers

Nvidia NVML Driver/library version mismatch

When I run nvidia-smi, I get the following message: Failed to initialize NVML: Driver/library version mismatch An hour ago I received the same message and uninstalled my CUDA library and I was able to run nvidia-smi, getting the following…

cuda driver gpu nvidia

asked Mar 25 '17 at 22:47

etal

12,914
4
13
16

321

votes

7 answers

Which TensorFlow and CUDA version combinations are compatible?

I have noticed that some newer TensorFlow versions are incompatible with older CUDA and cuDNN versions. Does an overview of the compatible versions or even a list of officially tested combinations exist? I can't find it in the TensorFlow…

tensorflow cuda version compatibility cudnn

asked May 31 '18 at 10:48

whiletrue

10,500
6
27
47

299

votes

5 answers

What is the canonical way to check for errors using the CUDA runtime API?

Looking through the answers and comments on CUDA questions, and in the CUDA tag wiki, I see it is often suggested that the return status of every API call should checked for errors. The API documentation contains functions like cudaGetLastError,…

cuda error-checking

asked Dec 26 '12 at 09:35

talonmies

70,661
34
192
269

281

votes

6 answers

Different CUDA versions shown by nvcc and NVIDIA-smi

I am very confused by the different CUDA versions shown by running which nvcc and nvidia-smi. I have both cuda9.2 and cuda10 installed on my ubuntu 16.04. Now I set the PATH to point to cuda9.2. So when I run $ which…

cuda

asked Nov 22 '18 at 00:44

yuqli

4,461
8
26
46

248

votes

13 answers

How to verify CuDNN installation?

I have searched many places but ALL I get is HOW to install it, not how to verify that it is installed. I can verify my NVIDIA driver is installed, and that CUDA is installed, but I don't know how to verify CuDNN is installed. Help will be much…

cuda computer-vision caffe conv-neural-network cudnn

asked Jul 09 '15 at 18:58

alfredox

4,082
6
21
29

243

votes

17 answers

A top-like utility for monitoring CUDA activity on a GPU

I'm trying to monitor a process that uses CUDA and MPI, is there any way I could do this, something like the command "top" but that monitors the GPU too?

cuda process-monitoring resource-monitor

asked Nov 22 '11 at 08:19

natorro

2,793
3
19
16

236

votes

10 answers

Using GPU from a docker container?

I'm searching for a way to use the GPU from inside a docker container. The container will execute arbitrary code so i don't want to use the privileged mode. Any tips? From previous research i understood that run -v and/or LXC cgroup was the way to…

cuda docker

asked Aug 07 '14 at 14:41

Regan

8,231
5
23
23

176

votes

2 answers

How do CUDA blocks/warps/threads map onto CUDA cores?

I have been using CUDA for a few weeks, but I have some doubts about the allocation of blocks/warps/thread. I am studying the architecture from a didactic point of view (university project), so reaching peak performance is not my concern. First of…

cuda gpgpu nvidia warp-scheduler

asked May 05 '12 at 09:58

Daedalus

1,761
3
11
3

174

votes

2 answers

Understanding CUDA grid dimensions, block dimensions and threads organization (simple explanation)

How are threads organized to be executed by a GPU?

cuda nvidia

asked Mar 06 '10 at 11:08

cibercitizen1

20,944
16
72
95

163

votes

5 answers

Using Java with Nvidia GPUs (CUDA)

I'm working on a business project that is done in Java, and it needs huge computation power to compute business markets. Simple math, but with huge amount of data. We ordered some CUDA GPUs to try it with and since Java is not supported by CUDA, I'm…

java cuda gpu multi-gpu

asked Apr 04 '14 at 15:27

Hans

1,846
3
14
19

149

votes

9 answers

Difference between global and device functions

Can anyone describe the differences between __global__ and __device__ ? When should I use __device__, and when to use __global__?.

cuda

asked Sep 11 '12 at 16:15

Mehdi Saman Booy

2,760
5
26
32

148

votes

6 answers

How do I select which GPU to run a job on?

In a multi-GPU computer, how do I designate which GPU a CUDA job should run on? As an example, when installing CUDA, I opted to install the NVIDIA_CUDA-<#.#>_Samples then ran several instances of the nbody simulation, but they all ran on one GPU…

cuda nvidia

asked Sep 22 '16 at 21:23

Steven C. Howell

16,902
15
72
97

146

votes

21 answers

CUDA incompatible with my gcc version

I have troubles compiling some of the examples shipped with CUDA SDK. I have installed the developers driver (version 270.41.19) and the CUDA toolkit, then finally the SDK (both the 4.0.17 version). Initially it didn't compile at all giving: error…

gcc cuda debian

asked Jul 08 '11 at 09:25

fbielejec

3,492
4
27
35

137

votes

3 answers

How do I choose grid and block dimensions for CUDA kernels?

This is a question about how to determine the CUDA grid, block and thread sizes. This is an additional question to the one posted here. Following this link, the answer from talonmies contains a code snippet (see below). I don't understand the…

performance optimization cuda gpu nvidia

asked Apr 03 '12 at 01:14

user1292251

1,655
3
16
16

2 3

…

99 100 Next