Questions tagged [compute-capability]

CUDA compute capability is a numerical indicator of GPU type and architectural generation.

CUDA compute capability is a numerical indicator of GPU type and architectural generation. It consists of a major and minor number, such as 3.5. The compute capability is used by the compiler (e.g. nvcc) to build code for a specific architecture. Questions with this tag should pertain to questions about compute capability itself, or errors arising from compute capability mismatch (code/device).

18 questions

votes

1 answer

Which Compute Capability is supported by which CUDA versions?

What are compute capabilities supported by each of: CUDA 5.5? CUDA 6.0? CUDA 6.5?

asked Mar 08 '15 at 22:51

Max

votes

3 answers

What utility/binary can I call to determine an nVIDIA GPU's Compute Capability?

Suppose I have a system with a single GPU installed, and suppose I've also installed a recent version of CUDA. I want to determine what's the compute capability of my GPU. If I could compile code, that would be easy: #include int main() { …

cuda utility compute-capability

asked Nov 19 '16 at 16:50

einpoklum

118,144
57
340
684

votes

1 answer

CUDA-capability and CUDA version: compatible?

I have one machine with a 1.1 compute-capability CUDA GPU. I want to reinstall CUDA and I think I'll go with 5.0; Is there any such thing as compatibility between CUDA-capability and CUDA's version? Will I get troubles in using CUDA 5.0 with a C-C…

cuda compute-capability

asked Jan 30 '13 at 09:24

a3mlord

1,060
6
16

votes

4 answers

How can I get CMake to automatically detect the value for CUDA_ARCHITECTURES?

Newer versions of CMake (3.18 and later), are "aware" of the choice of CUDA architectures which compilation of CUDA code targets. Targets have a CUDA_ARCHITECTURES property, which, when set, generates the appropriate -gencode…

cmake cuda nvidia build-automation compute-capability

asked Jul 02 '21 at 10:13

einpoklum

118,144
57
340
684

vote

1 answer

CMake idiom regarding minimum microarchitecture checking

Suppose I have a CUDA project and I'm writing its CMakeLists.txt. In my project, I have several .cu source files with kernels, each of which has a minimum NVIDIA microarchitecture version it supports. In my CMakeLists.txt, I would like to be able…

c++ cmake idioms compute-capability

asked Jul 18 '23 at 11:55

einpoklum

118,144
57
340
684

vote

1 answer

Understanding Warp Scheduler Utilization in CUDA: Maximum Concurrent Warps vs Resident Warps

In CUDA compute capability 8.6, each Streaming Multiprocessor (SM) has four warp schedulers. Each warp scheduler can schedule up to 16 warps concurrently, meaning that theoretically up to 64 warps could be running concurrently. However, in reality,…

cuda gpu nvidia gpgpu compute-capability

asked Jul 07 '23 at 16:26

Tokubara

vote

1 answer

Pre 8.x equivalent of __reduce_max_sync() in CUDA

cuda-memcheck has detected a race condition in the code that does the following: condition = /*different in each thread*/; shared int owner[nWarps]; /* ... owner[i] is initialized to blockDim.x+1 */ if(condition) { owner[threadIdx.x/32] =…

c++ parallel-processing cuda gpu-warp compute-capability

asked Oct 16 '21 at 14:23

Serge Rogatch

13,865
7
86
158

vote

0 answers

Setting the constraint in slurm job script for GPU compute capability

I am trying to set a constraint so that my job would only run on GPUs with compute capability higher (or equal) to 7. Here is my script named torch_gpu_sanity_venv385-11.slurm: #!/bin/bash #SBATCH --partition=gpu-L --gres=gpu:1 --constraint="cc7.0"…

gpu slurm compute-capability

asked Sep 14 '21 at 21:54

Mona Jalal

34,860
64
239
408

vote

1 answer

Cache behaviour in Compute Capability 7.5

These are my assumptions: There are two types of loads, cached and uncached. In the first one, the traffic goes through L1 and L2, while in the second one, the traffic goes only through L2. The default behaviour in Compute Capability 6.x and 7.x…

caching cuda gpgpu nsight compute-capability

asked Aug 20 '20 at 03:34

rm95

vote

1 answer

CUDA Compute Capability Backwards Compatibility

I am currently working with CUDA code compiled for compute capability 5.2. My machine happens to have a compute capability 5.2 GPU (GeForce GTX 970). However, my question is: will the code compiled for compute capability 5.2 still run on a machine…

c++ cuda compute-capability

asked Aug 24 '16 at 06:51

Robert May

vote

2 answers

Why is nvlink warning me about lack of sm_20 (compute capability 2.0) object code?

I'm working with CUDA 6.5 on a machine with a GTX Titan card (compute capability 3.5). I'm building my code with just -gencode=arch=compute_30,code=sm_30 -gencode=arch=compute_35,code=sm_35 - and when I link my binary, nvlink says: nvlink warning :…

cuda linker compute-capability nvlink

asked Feb 12 '16 at 13:00

einpoklum

118,144
57
340
684

votes

2 answers

CUDA atomicAdd_block is undefined

According to CUDA Programming Guide, "Atomic functions are only atomic with respect to other operations performed by threads of a particular set ... Block-wide atomics: atomic for all CUDA threads in the current program executing in the same thread…

cuda gpu-atomics compute-capability

asked Nov 02 '21 at 22:25

user2348209

votes

1 answer

CUDA device properties and compute capability when compiling

Let's assume I have a code which lets the user pass the threads_per_block to call the kernel. Then I want to check, if the input is valid (e.g. <=512 for compute capability CC <2.0 and 1024 for CC >=2.0). Now I wonder what would happen if I compile…

cuda nvcc ptx compute-capability

asked May 18 '11 at 14:22

tim

9,896
20
81
137

votes

2 answers

Compile CUDA code with cmake and 3.5 compute capability

I need to compile a CUDA code that use a dynamic parallelism with cmake. The code is: #include __global__ void childKernel() { printf("Hello "); } __global__ void parentKernel() { childKernel<<<1,1>>>(); …

c++ cuda compute-capability

asked Nov 20 '17 at 16:02

Gianluca Brilli

votes

2 answers

Maximum number of concurrent kernels & virtual code architecture

So I found this wikipedia resource Maximum number of resident grids per device (Concurrent Kernel Execution) and for each compute capability it says a number of concurrent kernels, which I assume to be the maximum number of concurrent…

cuda compute-capability

asked Dec 11 '16 at 22:03

user2255757

2 Next