3

I have a large array of floats, possible millions of cells and an algorithm that will operate on this data until it reaches a state where no more work can be done. If a single float of these is greater than zero, a boolean should be set to true and then passed to the host which means that the kernel should be scheduled for execution again. There is a work-item for each cell doing calculations. I have so far considered using an two-stage |= reduction on the whole array which seems to be the proper way to do things. Another really slow way would be to use atomic operations.

As I only want to set a specific value if a work item does some work and leave it alone otherwise, can I pass a global boolean which can be modified by every work item in every work group without the use of atomics and still achieve the intended effect? Suppose that this boolean gets initialized to false and can only ever be set to true by the work items, can I ever get a wrong result? Is this a bad idea, if so, why?

Steinin
  • 541
  • 7
  • 20

1 Answers1

3

Interesting question.

As I only want to set a specific value if a work item does some work and leave it alone otherwise, can I pass a global boolean which can be modified by every work item in every work-group without the use of atomics and still achieve the intended effect?

I think this proposal will work, and it's probably the most efficient solution. Two notes, though:

  • Remember that your kernel should contain something like if (condition) shouldContinue = 1 and not shouldContinue = condition - even though the latter has better performance, you must prevent any storage of 0 to the memory, because you have no control of the order.

  • Because you want to explicitly override the memory, I wouldn't go with bool, because I want to make sure the system won't have to load an entire word before the store. In fact, I'd go with some type that would be large enough to allow the compiler to use a non-temporal store here, assuming the hardware and the compiler support that - e.g. use a type that takes up a whole cache line, such as int16, and set it to some value.

Community
  • 1
  • 1
Oak
  • 26,231
  • 8
  • 93
  • 152