The OpenCL 3.0 specification does not seem to have intrinsics/builtins for atomic addition to floating-point values, only for integral values (and that seems to have been the case in OpenCL 1.x and 2.x as well). CUDA, however, has offered floating-point atomics for a while now:
float atomicAdd(float* address, float val); // since Fermi
double atomicAdd(double* address, double val); // since Pascal
__half atomicAdd(__half *address, __half val); // ?
Naturally, any straightforward atomic operation can be simulated with compare-and-exchange, and this is available in OpenCL. But my questions are:
- Does NVIDIA expose floating-point atomics in OpenCL somehow? e.g. via a vendor extension? using pragmas? implicitly?
- Is there a more efficient mechanism than simulation with compare-exchange, which I could consider as a substitute for floating-point atomics? For NVIDIA GPUs or generally?