1

It appears that OpenCL 3.0 had added support to the long-waited atomic operations for floating point numbers, however, after spending hours, I still can't find a single example showing how to use such functions.

I've already been using a common hack to achieve float32 atomic_add, but I wanted to try OpenCL 3's built-in support, I tried defining a macro to call atomic_fetch_add, like below

#if __OPENCL_C_VERSION__ >= CL_VERSION_3_0
  #pragma OPENCL EXTENSION cl_ext_float_atomics : enable
  #define atomicadd(a,b) atomic_fetch_add((volatile atomic_float *)(a),(b)) 
#else
  inline float atomicadd(volatile __global float* address, const float value) {
    float old = value, orig;
    while ((old = atomic_xchg(address, (orig = atomic_xchg(address, 0.0f)) + old)) != 0.0f);
    return orig;
  }
#endif

but I am getting tons of errors:

<kernel>:320:26: warning: unknown OpenCL extension 'cl_ext_float_atomics' - ignoring
#pragma OPENCL EXTENSION cl_ext_float_atomics : enable
                         ^
<kernel>:773:17: error: no matching function for call to 'atomic_fetch_add'
                atomicadd(& field[*idx1d + tshift * gcfg->dimlen.z], -p[0].w);
                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<kernel>:321:24: note: expanded from macro 'atomicadd'
#define atomicadd(a,b) atomic_fetch_add((volatile atomic_float *)(a),(b)) 
                       ^~~~~~~~~~~~~~~~
cl_kernel.h:4571:1: note: candidate function not viable: no known conversion from 'volatile atomic_float *' to 'volatile atomic_int *__attribute__((address_space(16776963)))' for 1st argument
DECL_ATOMIC_FETCH_MOD(atomic_int, int, int)
^
cl_kernel.h:4563:3: note: expanded from macro 'DECL_ATOMIC_FETCH_MOD'
  DECL_ATOMIC_FETCH_MOD_OP(add, A, C, M) \
  ^
...

where field[] is a global memory float buffer. My computer has 2x GTX 2080 with driver 515.x. clinfo reports that both devices support OpenCL 3.0

what is the right way to call atomic_fetch_add with float type?

FangQ
  • 1,444
  • 10
  • 18
  • 1
    It seems that Nvidia GPUs still only support the OpenCL C 1.2 language standard, as can be queried with cl_device.getInfo(). – ProjectPhysX Sep 24 '22 at 19:01
  • from [this link](https://developer.nvidia.com/blog/nvidia-is-now-opencl-3-0-conformant/), it appears that nvidia driver 465 or newer added support for OpenCL 3.0. it is confirmed from `clinfo` output: `Number of platforms: 2 Platform Name: NVIDIA CUDA, Platform Vendor: NVIDIA Corporation; Platform Version: OpenCL 3.0 CUDA 11.6.134` – FangQ Sep 24 '22 at 23:32
  • 2
    `cl_ext_float_atomics` is an optional extension and a vendor can claim conformance without providing that feature. – Björn Lindqvist Sep 27 '22 at 12:46
  • Does this answer your question? [Atomic addition to floating point values in OpenCL for NVIDIA GPUs?](https://stackoverflow.com/questions/72044986/atomic-addition-to-floating-point-values-in-opencl-for-nvidia-gpus) – Björn Lindqvist Sep 27 '22 at 12:47
  • 1
    @BjörnLindqvist, my code already has a dedicated [CUDA version](https://github.com/fangq/mcx), therefore, the opencl version was really designed for general use (CPU, multi-vendor GPUs). Using PTX assembly won't solve issues in a portable fashion. – FangQ Sep 28 '22 at 15:26
  • 1
    After further reading on this, I realized that @ProjectPhysX's answer was correct - as also described in [this post](https://stackoverflow.com/a/67372358/4271392), while `CL_DEVICE_VERSION` supports ocl3.0, `CL_DEVICE_OPENCL_C_VERSION` suggests that compiler only supports ocl1.2 on NVIDIA devices. my clinfo shows `Device OpenCL C Version: OpenCL C 1.2` on all my nvidia GPUs, including a recently acquired 3090. Shame on NVIDIA for not updating their OpenCL driver in order to stay monopolized with CUDA. – FangQ Sep 28 '22 at 19:59

1 Answers1

1

Making my initial comment the answer here:

Nvidia GPUs still only support the OpenCL C 1.2 language standard, as can be queried with cl_device.getInfo<CL_DEVICE_OPENCL_C_VERSION>(). The Platform version is reported as 3.0, but the features are still unchanged from 1.2, especially the recent cl_ext_float_atomics extension is not yet supported.

In theory you could make a switch in code between the usual atomics_add_f workaround and the inline PTX version based on if the device vendor is reported as "Nvidia", or based on if some common nv_... extensions are available.

However this is still not the elegant universally compatible solution that cl_ext_float_atomics promises. It's a very desired feature and I hope the vendors will implement it soon.

ProjectPhysX
  • 4,535
  • 2
  • 14
  • 34
  • 1
    Answer accepted. I want to add that while `atomic_xchg` approach works, it is very inefficient (at least on NVIDIA GPUs). By profiling my similarly implemented [OpenCL](https://github.com/fangq/mcxcl) and [CUDA](https://github.com/fangq/mcx) codes, the atomic writing part costs nearly 50% of the run-time in the OpenCL implementation compared to only 10% on CUDA (making OpenCL about 2x slower). So, allowing to use CUDA-like `atomic_add` for float via supporting `cl_ext_float_atomics` is not just desired for easy implementation, but also will make a big impact to speed. – FangQ Sep 29 '22 at 13:06