Why using "volatile" keyword for shared memory is not possible when atomic operations are done on shared memory?

Question

I have a piece of CUDA code in which threads are performing atomic operations on shared memory. I was thinking since the result of atomic operation will be visible to other threads of the block instantly anyways, it might be good to instruct the compiler to have the shared memory volatile.
So I changed

__global__ void CoalescedAtomicOnSharedMem(int* data, uint nElem)
{
    __shared__ int smem_data[BLOCK_SIZE];
    uint tid = (blockIdx.x * blockDim.x) + threadIdx.x;
    for ( uint i = tid; i < nElem; i += blockDim.x*gridDim.x){
        atomicAdd( smem_data+threadIdx.x, 6);
    }
}

to

__global__ void volShared_CoalescedAtomicOnSharedMem(int* data, uint nElem)
{
    volatile __shared__ int smem_data[BLOCK_SIZE];
    uint tid = (blockIdx.x * blockDim.x) + threadIdx.x;
    for ( uint i = tid; i < nElem; i += blockDim.x*gridDim.x){
        atomicAdd( smem_data+threadIdx.x, 6);
    }
}

Below compile-time error happens having above change:

error: no instance of overloaded function "atomicAdd" matches the argument list
        argument types are: (volatile int *, int)

Why isn't a volatile address supported as an argument for atomic operations? Is it because compiler already treats the shared memory as volatile as soon as it identifies that there's going to be atomic operations on it?

Have you considered overloading this function or defining a wrapper working on `volatile`s? — Vitality, Apr 13 '14 at 19:00
@JackOLantern Good point. I can use atomic with `volatile` shared memory like `atomicAdd( (int*)(smem_data+threadIdx.x), 6);`. I observe no difference in performance compared to version not having `volatile` keyword for shared memory. — Farzad, Apr 13 '14 at 19:30
This question still remains: does *NVCC* treat shared memory used for atomics as `volatile`? — Farzad, Apr 13 '14 at 19:30
@RobertCrovella has provided you with the correct answer. There is no need indeed to define a wrapper or an overloaded function (as in my comment above), since a simple cast would be enough. — Vitality, Apr 13 '14 at 20:37
@Farzad I believe you're correct that `atomicAdd` should have `volatile` overloads. It's simply an oversight. IIRC I submitted a bug report a while back to correct this. You can coerce it to work with a `const_cast`. — Jared Hoberock, Apr 14 '14 at 04:44

score 8 · Accepted Answer · edited May 23 '17 at 12:31

The definition of the volatile qualifier is given in the programming guide. It instructs the compiler to always generate a read or write for that access, and never "optimize" it into a register or some other optimization.

Since atomic operations are guaranteed to act on actual memory locations (either shared or global) the combination of the two is unnecessary. Therefore, versions of atomic functions prototyped for volatile qualifier are not provided.

If you have a memory location that is already declared as volatile, simply cast it to the corresponding non-volatile type when you pass the address to your atomic function. The behavior will be as expected.(example)

Therefore, atomic operations can operate on locations specified as volatile with this proviso.

The simple fact that you have accessed a particular location using atomics somewhere in your code does not mean that the compiler will treat every access elsewhere as implicitly volatile. If you need volatile behavior elsewhere, declare it explicitly.

score -1 · Answer 2 · edited May 11 '23 at 18:27

The previous poster has correctly identified the problem: There is no atomicAdd function defined that takes a volatile parameter.

Your question as to why this is the case, my guess is that your library developers simply omitted that interface. Imagine all the combinations of volatile, const, and possible parameters and the number of potential interfaces starts to explode.

Why isn't a volatile address supported as an argument for atomic operations?

Atomic operations are not part of C/C++. In your case, they are being implemented in a library that is probably implemented in assembly language.

Is it because compiler already treats the shared memory as volatile as soon as it identifies there's going to be atomic operations on it?

No, this is they way the library writer has defined the function interface.

C11 does include the `_Atomic` keyword (and some functions). C++11 includes the `std::atomic` class library in `#include `. C++11 functions like `atomic_load( const volatile std::atomic* )` actually come in both volatile and non-volatile flavours. (https://en.cppreference.com/w/cpp/atomic/atomic_load). So `volatile std::atomic foo;` Just Works, the same as `std::atomic bar;`. You don't need both const and non-const overloads; implicit conversion is allowed there, unlike with volatile. So CUDA "only" would have needed twice as many functions to allow volatile. — Peter Cordes, May 11 '23 at 18:48

Why using "volatile" keyword for shared memory is not possible when atomic operations are done on shared memory?

2 Answers2