Questions tagged [compute-capability]

CUDA compute capability is a numerical indicator of GPU type and architectural generation.

CUDA compute capability is a numerical indicator of GPU type and architectural generation. It consists of a major and minor number, such as 3.5. The compute capability is used by the compiler (e.g. nvcc) to build code for a specific architecture. Questions with this tag should pertain to questions about compute capability itself, or errors arising from compute capability mismatch (code/device).

18 questions
29
votes
1 answer

Which Compute Capability is supported by which CUDA versions?

What are compute capabilities supported by each of: CUDA 5.5? CUDA 6.0? CUDA 6.5?
Max
  • 441
  • 2
  • 7
  • 14
15
votes
3 answers

What utility/binary can I call to determine an nVIDIA GPU's Compute Capability?

Suppose I have a system with a single GPU installed, and suppose I've also installed a recent version of CUDA. I want to determine what's the compute capability of my GPU. If I could compile code, that would be easy: #include int main() { …
einpoklum
  • 118,144
  • 57
  • 340
  • 684
6
votes
1 answer

CUDA-capability and CUDA version: compatible?

I have one machine with a 1.1 compute-capability CUDA GPU. I want to reinstall CUDA and I think I'll go with 5.0; Is there any such thing as compatibility between CUDA-capability and CUDA's version? Will I get troubles in using CUDA 5.0 with a C-C…
a3mlord
  • 1,060
  • 6
  • 16
4
votes
4 answers

How can I get CMake to automatically detect the value for CUDA_ARCHITECTURES?

Newer versions of CMake (3.18 and later), are "aware" of the choice of CUDA architectures which compilation of CUDA code targets. Targets have a CUDA_ARCHITECTURES property, which, when set, generates the appropriate -gencode…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
1
vote
1 answer

CMake idiom regarding minimum microarchitecture checking

Suppose I have a CUDA project and I'm writing its CMakeLists.txt. In my project, I have several .cu source files with kernels, each of which has a minimum NVIDIA microarchitecture version it supports. In my CMakeLists.txt, I would like to be able…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
1
vote
1 answer

Understanding Warp Scheduler Utilization in CUDA: Maximum Concurrent Warps vs Resident Warps

In CUDA compute capability 8.6, each Streaming Multiprocessor (SM) has four warp schedulers. Each warp scheduler can schedule up to 16 warps concurrently, meaning that theoretically up to 64 warps could be running concurrently. However, in reality,…
Tokubara
  • 392
  • 3
  • 13
1
vote
1 answer

Pre 8.x equivalent of __reduce_max_sync() in CUDA

cuda-memcheck has detected a race condition in the code that does the following: condition = /*different in each thread*/; shared int owner[nWarps]; /* ... owner[i] is initialized to blockDim.x+1 */ if(condition) { owner[threadIdx.x/32] =…
Serge Rogatch
  • 13,865
  • 7
  • 86
  • 158
1
vote
0 answers

Setting the constraint in slurm job script for GPU compute capability

I am trying to set a constraint so that my job would only run on GPUs with compute capability higher (or equal) to 7. Here is my script named torch_gpu_sanity_venv385-11.slurm: #!/bin/bash #SBATCH --partition=gpu-L --gres=gpu:1 --constraint="cc7.0"…
Mona Jalal
  • 34,860
  • 64
  • 239
  • 408
1
vote
1 answer

Cache behaviour in Compute Capability 7.5

These are my assumptions: There are two types of loads, cached and uncached. In the first one, the traffic goes through L1 and L2, while in the second one, the traffic goes only through L2. The default behaviour in Compute Capability 6.x and 7.x…
rm95
  • 167
  • 6
1
vote
1 answer

CUDA Compute Capability Backwards Compatibility

I am currently working with CUDA code compiled for compute capability 5.2. My machine happens to have a compute capability 5.2 GPU (GeForce GTX 970). However, my question is: will the code compiled for compute capability 5.2 still run on a machine…
Robert May
  • 11
  • 4
1
vote
2 answers

Why is nvlink warning me about lack of sm_20 (compute capability 2.0) object code?

I'm working with CUDA 6.5 on a machine with a GTX Titan card (compute capability 3.5). I'm building my code with just -gencode=arch=compute_30,code=sm_30 -gencode=arch=compute_35,code=sm_35 - and when I link my binary, nvlink says: nvlink warning :…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
0
votes
2 answers

CUDA atomicAdd_block is undefined

According to CUDA Programming Guide, "Atomic functions are only atomic with respect to other operations performed by threads of a particular set ... Block-wide atomics: atomic for all CUDA threads in the current program executing in the same thread…
user2348209
  • 136
  • 11
0
votes
1 answer

CUDA device properties and compute capability when compiling

Let's assume I have a code which lets the user pass the threads_per_block to call the kernel. Then I want to check, if the input is valid (e.g. <=512 for compute capability CC <2.0 and 1024 for CC >=2.0). Now I wonder what would happen if I compile…
tim
  • 9,896
  • 20
  • 81
  • 137
0
votes
2 answers

Compile CUDA code with cmake and 3.5 compute capability

I need to compile a CUDA code that use a dynamic parallelism with cmake. The code is: #include __global__ void childKernel() { printf("Hello "); } __global__ void parentKernel() { childKernel<<<1,1>>>(); …
0
votes
2 answers

Maximum number of concurrent kernels & virtual code architecture

So I found this wikipedia resource Maximum number of resident grids per device (Concurrent Kernel Execution) and for each compute capability it says a number of concurrent kernels, which I assume to be the maximum number of concurrent…
user2255757
  • 756
  • 1
  • 6
  • 24
1
2