Questions tagged [gpu-atomics]

Modern GPUs support atomic operations in different memory spaces. These are different in implementation and in consequences on execution flow than atomic operations on CPUs.

On modern GPUs, atomic operations in global device memory may require synchronization among thousands of logical threads (or hundreds of warps/wavefronts). A GPU may also support atomic operations on an individual processing core's memory (shared memory in CUDA parlance, local memory in OpenCL parlance) - which behave differently (performance-wise and execution-flow-wise) than global memory atomics.

Reading on GPU atomics:

The OpenCL reference guide's section on atomic operations.
Intel's guide to using atomic operations with OpelCL.
The section on atomic operations in nVIDIA's CUDA Programming guide.

34 questions

votes

2 answers

What are all the atomic operations in CUDA?

I was wondering if there is a complete list of atomic operations usable in CUDA kernels. I couldn't find something like that on the internet.

cuda gpu-atomics

asked Aug 02 '12 at 07:47

soroosh.strife

1,181
4
19
45

votes

1 answer

Atomic Operations in CUDA? Which header file to include?

For using atomic operations in CUDA, is it necessary to include some CUDA header file? The CUDA programming guide seems to be tightlipped on this. The code glmax.cu given below is giving me the following compilation error. gaurish108 MyPractice:…

cuda gpu-atomics

asked Nov 03 '11 at 21:45

smilingbuddha

14,334
33
112
189

votes

2 answers

How can I implement a custom atomic function involving several variables?

I'd like to implement this atomic function in CUDA: __device__ float lowest; // global var __device__ int lowIdx; // global var float realNum; // thread reg var int index; // thread reg var if(realNum < lowest) { lowest= realNum; //…

cuda atomic gpu-atomics ptxas

asked Jul 01 '13 at 18:46

Doug

2,783
6
33
37

votes

1 answer

How to use atomic operations on an SSBO in a compute shader

Example code Here is a bare-bones compute shader to illustrate my question layout(local_size_x = 64) in; // Persistent LIFO structure with a count of elements layout(std430, binding = 0) restrict buffer SMyBuffer { int count; float…

opengl glsl gpu-atomics

asked Dec 05 '17 at 20:17

bernie

9,820
5
62
92

votes

1 answer

question about modifing flag array in cuda

i am doing a research about GPU programming and have a question about modifying global array in thread. __device__ float data[10] = {0,0,0,0,0,0,0,0,0,1}; __global__ void gradually_set_global_data() { while (1) { if (data[threadIdx.x +…

concurrency cuda gpu-atomics

asked Apr 08 '20 at 09:11

hustwjq

votes

1 answer

CUDA atomic operations and concurrent kernel launch

Currently I develop a GPU-based program that use multiple kernels that are launched concurrently by using multiple streams. In my application, multiple kernels need to access a queue/stack and I have plan to use atomic operations. But I do not know…

concurrency cuda gpu-atomics

asked Dec 23 '13 at 08:08

user3128889

votes

1 answer

Speeding up CUDA atomics calculation for many bins/few bins

I am trying to optimize my histogram calculations in CUDA. It gives me an excellent speedup over corresponding OpenMP CPU calculation. However, I suspect (in keeping with intuition) that most of the pixels fall into a few buckets. For argument's…

optimization cuda histogram binning gpu-atomics

asked Sep 17 '16 at 05:49

kakrafoon

votes

2 answers

How to have atomic load in CUDA

My question is how I can have atomic load in CUDA. Atomic exchange can emulate atomic store. Can atomic load be emulated non-expensively in a similar manner? I can use an atomic add with 0 to load the content atomically but I think it is expensive…

cuda gpu-atomics

asked Sep 01 '15 at 21:19

kirill

votes

3 answers

error : identifier "atomicAdd" is undefined under visual studio 2010 & cuda 4.2 with Fermi GPU

I was trying to compile some CUDA codes under visual studio 2010 with CUDA 4.2 (I created this CUDA project using Parallel Nsight 2.2), but I encountered an atomic problem "error : identifier "atomicAdd" is undefined", which I still can't solve…

visual-studio-2010 cuda atomic gpu-atomics

asked Jul 18 '12 at 00:28

G_fans

votes

7 answers

CUDA: reduction or atomic operations?

I'm writing a CUDA kernel which involves calculating the maximum value on a given matrix and I'm evaluating possibilities. The best way I could find is: Forcing every thread to store a value in the shared memory and using a reduction algorithm after…

algorithm matrix cuda reduction gpu-atomics

asked May 07 '11 at 21:01

Marco A.

43,032
26
132
246

votes

3 answers

Which is faster for CUDA shared-mem atomics - warp locality or anti-locality?

Suppose many warps in a (CUDA kernel grid) block are updating a fair-sized number of shared memory locations, repeatedly. In which of the cases will such work be completed faster? : The case of intra-warp access locality, e.g. the total number of…

cuda gpu-shared-memory gpu-atomics

asked Aug 01 '18 at 21:28

einpoklum

118,144
57
340
684

votes

1 answer

OpenGL atomic counters vs atomics in a SSBO

I came across this article that states there are no differences in performance between atomic counter buffers and an atomic variable in an…

opengl gpu-atomics

asked Jan 19 '17 at 08:06

iam

1,623
1
14
28

votes

1 answer

Is a combination of atomic CAS for 64 and 32 bit ok?

My global array contains struct {float,float}. The first thing I do to it is a 64bit CAS on one of the structs. Depending on the return value I (may) want to modify the second float. Now I have the option of either using a 32bit CAS, or a 64bit. I…

cuda compare-and-swap gpu-atomics

asked Sep 19 '22 at 08:53

John

votes

2 answers

Atomic addition to floating point values in OpenCL for NVIDIA GPUs?

The OpenCL 3.0 specification does not seem to have intrinsics/builtins for atomic addition to floating-point values, only for integral values (and that seems to have been the case in OpenCL 1.x and 2.x as well). CUDA, however, has offered…

floating-point opencl nvidia gpgpu gpu-atomics

asked Apr 28 '22 at 13:56

einpoklum

118,144
57
340
684

votes

1 answer

Is there proper CUDA atomicLoad function?

I've faced with the issue that CUDA atomic API do not have atomicLoad function. After searching on stackoverflow, I've found the following implementation of CUDA atomicLoad But looks like this function is failed to work in following…

cuda gpu-atomics

asked Feb 05 '22 at 10:13

Denis Kotov

2 3 Next