0

According to CUDA Programming Guide, "Atomic functions are only atomic with respect to other operations performed by threads of a particular set ... Block-wide atomics: atomic for all CUDA threads in the current program executing in the same thread block as the current thread. These are suffixed with _block, e.g., atomicAdd_block"

However, I cannot use atomicAdd_block while my code is compiled fine with atomicAdd. Is there any header or library that I should add or link to?

einpoklum
  • 118,144
  • 57
  • 340
  • 684
user2348209
  • 136
  • 11
  • 2
    What CUDA version are you using? What's your GPU's Compute Capability? – einpoklum Nov 02 '21 at 22:28
  • Thanks, do you know when the new function was introduced? I am using 11.4 – user2348209 Nov 02 '21 at 22:32
  • Actually, I changed my comment; it might also be that your GPU does not support them. – einpoklum Nov 02 '21 at 22:33
  • My device is Tesla V100, so the compute capability is 7.0 – user2348209 Nov 02 '21 at 22:34
  • 1
    You need to specifically compile for compute capability 6.0 or higher. So try adding `-arch=sm_70` on your `nvcc` compile command line, to match your V100 GPU. – Robert Crovella Nov 02 '21 at 22:36
  • 1
    From [here](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#atomic-functions): "Devices with compute capability less than 6.0 only support device-wide atomic operations" so to enable that feature you have to specifically compile for a cc6.0 device or higher. – Robert Crovella Nov 02 '21 at 22:39

2 Answers2

2

atomicAdd() has been supported for a long time - by earlier versions of CUDA and with older micro-architectures. However, atomicAdd_system() and atomicAdd_block were introduced, IIANM, with the Pascal micro-architecture, in 2016. The minimum Compute Capability in which they are supported is 6.0. If you're targeting CC 5.2 or earlier - or if your CUDA version is several years old - then they might not be available to you.

This is actually likely to be the case, since even for the current version of CUDA, nvcc will default to Compute Capability 5.2 if no other value is specified with -gencode or -arch (e.g. if you run nvcc -o out my_file.cu).

einpoklum
  • 118,144
  • 57
  • 340
  • 684
1

As Robert said, the solution is to add -arch=sm_70 in compile or for those who use CMake is to add set(CMAKE_CUDA_ARCHITECTURES 70) to their CMakeLists.txt

user2348209
  • 136
  • 11
  • It is inappropriate to hard-code features of the *system you're building on* into a `CMakeLists.txt` file, which represents requirements of the *project being built*. – einpoklum Nov 02 '21 at 22:55
  • Thank you for your comment, do you have any solution to make it more portable? – user2348209 Nov 02 '21 at 22:57
  • 1
    A proper solution is not yet available in CMake. But, for now, read [this question and answer](https://stackoverflow.com/q/68223398/1593077) of mine. – einpoklum Nov 02 '21 at 22:59
  • ... and you're reminded me to file [this CMake issue](https://gitlab.kitware.com/cmake/cmake/-/issues/22839) :-) – einpoklum Nov 02 '21 at 23:15