1

When I write CUDA code,I use atomic Operation to force a global sychronization at the last step.

Then I also have to implemente the same task in OpenCL, I wonder is there is a similar operation in OpenCL like atomic operation in CUDA that I can use, my devices is a fpga board..

talonmies
  • 70,661
  • 34
  • 192
  • 269
mojer
  • 11
  • 2
  • 1
    see [here](https://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/atomicFunctions.html) – Robert Crovella Aug 28 '16 at 23:24
  • hi, Robert, thanks a lot. But is there any way to implement atmoic operation on float datatype, it only support int in your link.. – mojer Aug 29 '16 at 01:52
  • Why would you need a `float` value if your goal is to implement synchronization? You can implement global synchronization using `int` values. – Robert Crovella Aug 29 '16 at 01:59
  • My goal is to implement global write synchronization to a global memory space that contain float values...just like what in CUDA atomicAdd() do(serial access to the memory space by different working items).. – mojer Aug 29 '16 at 04:50

3 Answers3

3

barrier() may be something similar to what you are looking for, but can only force a "join" on threads in the same workgroup.

See this post. You may be able to use CLK_GLOBAL_MEM_FENCE to get the results you are looking for.

Stack overflow: Barriers in OpenCL

Community
  • 1
  • 1
1

There is no kernel-level global synchronization is OpenCL and CUDA since entire workgroups may finish before others can be started. Only workgroup level synchronization is available inside a kernel. For global synchronization you much use multiple kernels.

Dithermaster
  • 6,223
  • 1
  • 12
  • 20
0

According to your comment, it seems like you want atomic operations on float values.

Please check out this link: atomic operation and floats in opencl

The idea is to use the built in atom_cmpxchg operation to try to swap the old value of a float point variable with a new value, which could be be its addition with another value, or multiplication, division, subtraction, etc.

The swapping only succeeds if the old value is actually the old value (that's where the cmp comes into play). Otherwise, it will do it again in a while loop.

Notice that this atomic operation could be quite slow if many threads are doing this operation on a single value.

lnyng
  • 925
  • 6
  • 18