11

I was having previously already the problem that I wanted to blend color values in an image unit by doing something like:

vec4 texelCol = imageLoad(myImage, myTexel);
imageStore(myImage, myTexel, texelCol+newCol);

In a scenario where multiple fragments can have the same value for 'myTexel', this aparently isn't possible because one can't create atomicity between the imageLoad and imageStore commands and other shaderinvocations could change the texel color in between.

Now someone told me that poeple are working arround this problem by creating semaphores using the atomic comands on uint textures, such that the shader would wait somehow in a while loop before accessing the texel and as soon as it is free, atomically write itno the integer texture to block other fragment shader invocations, process the color texel and when finished atomically free the integer texel again.

But I can't get my brains arround how this could really work and how such code would look like?

Is it really possible to do this? can a GLSL fragment shader be set to wait in a while loop? If it's possible, can someone give an example?

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
Mat
  • 4,281
  • 9
  • 44
  • 66
  • From [the extension specification](http://www.opengl.org/registry/specs/EXT/shader_image_load_store.txt) it looks like you'll need to arrange suitable memory barriers, either with `MemoryBarrierEXT()` or `memoryBarrier()` in the shaders themselves. – Flexo May 18 '12 at 12:48
  • @awoodland: Memory barriers cannot allow other shaders running at the same stage to read the memory. – Nicol Bolas May 18 '12 at 13:02
  • See my answer to http://stackoverflow.com/a/16802075/1388799 for the currently-working solution to the GLSL semaphore problem on Nvidia Kepler cards. – Walt Donovan May 28 '13 at 22:00

1 Answers1

12

Basically, you're just implementing a spinlock. Only instead of one lock variable, you have an entire texture's worth of locks.

Logically, what you're doing makes sense. But as far as OpenGL is concerned, this won't actually work.

See, the OpenGL shader execution model states that invocations execute in an order which is largely undefined relative to one another. But spinlocks only work if there is a guarantee of forward progress among the various threads. Basically, spinlocks require that the thread which is spinning not be able to starve the execution system from executing the thread that it is waiting on.

OpenGL provides no such guarantee. Which means that it is entirely possible for one thread to lock a pixel, then stop executing (for whatever reason), while another thread comes along and blocks on that pixel. The blocked thread never stops executing, and the thread that owns the lock never restarts execution.

How might this happen in a real system? Well, let's say you have a fragment shader invocation group executing on some fragments from a triangle. They all lock their pixels. But then they diverge in execution due to a conditional branch within the locking region. Divergence of execution can mean that some of those invocations get transferred to a different execution unit. If there are none available at the moment, then they effectively pause until one becomes available.

Now, let's say that some other fragment shader invocation group comes along and gets assigned an execution unit before the divergent group. If that group tries to spinlock on pixels from the divergent group, it is essentially starving the divergent group of execution time, waiting on an event that will never happen.

Now obviously, in real GPUs there is more than one execution unit, but you can imagine that with lots of invocation groups out there, it is entirely possible for such a scenario to occasionally jam up the works.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • I don't understand why the integer texture should be of the size of the screen. That would mean one lock value per fragment. but shouldn't there be one lock value per texel of the image to be written? – Mat May 18 '12 at 14:09
  • 2
    @Mat: Are there more texels of the image than pixels of the screen? It would help to know these things in advance. – Nicol Bolas May 18 '12 at 14:11
  • sometimes more, usually less. it depends on the case. So the fragment shader will stay in the while loop until the condition is actually met? is GLSL not somehow guaranteeing that the shaders return and therefore kill a shader invocation if it spins too long? – Mat May 18 '12 at 14:14
  • @Mat: GLSL does not do anything of the kind. However, the driver might. This can't cause an infinite loop (so Window's won't kill your application), but the driver or hardware may automatically terminate a shader that goes on too long. Or it may not. – Nicol Bolas May 18 '12 at 14:23
  • so, if the driver is allowed to terminate the shader if it loops too long, the result is actually non deterministic? – Mat May 18 '12 at 14:25
  • @Mat: "is allowed"? What does that mean in this context? GLSL doesn't "allow" it, but that doesn't change the fact that the actual driver or hardware could do *anything*. The result of your shader is as deterministic as your driver or hardware allows. Just as the result of a program you execute is only deterministic if your OS doesn't decide to terminate it if it does unruly things. – Nicol Bolas May 18 '12 at 15:04
  • then, is this an unruly thing to do? I just need a good argument why this is a bad thing to do actually :) – Mat May 19 '12 at 14:16
  • I never said it was a bad thing. Ultimately, the only thing you can do is try it and see what happens. – Nicol Bolas May 19 '12 at 14:24
  • 2
    It is a old topic, but don't you risk a dead lock with this approach?* – Antoine Morrier Aug 11 '17 at 06:35