2

I'm trying to implement a GLSL spinlock to be able to implement single-pass depth peeling. I'm having trouble because examples of locking texture use are scarce. I have to admit that I don't really know what I'm doing, here, so I describe probably more context than is necessary, just to be safe.

I wrote a fragment program which should do effectively nothing:

#version 420 core

//The lock texture holds either 0 or 1.
//0 means that the texture is available.
//1 means that the texture is locked.  
layout(r32ui) coherent uniform uimage2D img2D_0; //locking texture

layout(RGBA32F) coherent uniform image2D img2D_1; //data texture (currently unused)

void main() {
    ivec2 coord = ivec2(gl_FragCoord.xy);

    //The loop's exchange function swaps out the old value with 1.

    //If the locking texture was 0, 0 will be returned, terminating the loop;
    //the locking texture will now contain 1, indicating that the locking
    //texture is now locked.

    //Conversely, if the locking texture contains 1, then the exchange function
    //will write a 1 (so the texture is still locked), and return 1, indicating
    //that the texture is locked and unavailable.
    while (imageAtomicExchange(img2D_0,coord,1u)==1u);

    //The locking texture is locked.  More code would go here

    //This unlocks the texture.
    imageAtomicExchange(img2D_0,coord,0);
}

The locking texture is created like so:

//data is an array initialized to all 0.
glTexImage2D(GL_TEXTURE_2D,0,GL_R32UI,size_x,size_y,0,GL_RED_INTEGER,GL_UNSIGNED_INT,data);

To execute the algorithm, I take a FBO, with a color RGBA F32 render attachment and enable it. I bind the above shader, then pass the locking texture to img2D_0 and the color attachment to img2D_1, using this code:

glBindImageTextureEXT(
    /* 0, 1, respectively */,
    texture_id, 0,GL_FALSE,0, GL_READ_WRITE,
    /* GL_R32UI, GL_RGBA32F, respectively */
);

The object is then rendered with a VBO, and some secondary passes show the contents of the data.

The problem is that the fragment program given crashes the video driver (because it never terminates). My question is why? The texture is initialized to 0, and I'm pretty sure my logic for the exchange functions is valid. Is my setup and methodology basically correct?

geometrian
  • 14,775
  • 10
  • 56
  • 132
  • convince yourself that it's been initialized correctly by sampling the texture and storing the values to another texture, dump them on the host. – Brian Cain Aug 04 '12 at 15:20
  • That's a good test to do. Removing the broken code and storing vec4(0.25,0.0,0.0,0.0) if the lock is 0, vec4(0.0,0.50,0.0,0.0) if the lock is 1, vec4(0.0,0.0,0.75,0.0) otherwise, and dumping the results to the data texture shows only (0.25,0.0,0.0,0.0)--i.e., that (at the beginning at least) all lock values are 0. – geometrian Aug 04 '12 at 15:30
  • GPU ShaderAnalyzer gives: http://pastebin.com/GwhvR78Q – geometrian Aug 04 '12 at 16:39
  • See http://stackoverflow.com/a/16802075/1388799 for the solution to this problem. – Walt Donovan May 30 '13 at 00:17

1 Answers1

3

One issue is that if two threads in the same warp hit the same lock location, that warp will deadlock as one thread will acquire the lock and the other thread will loop, and the warp will continue executing the looping thread, which prevents the thread with the lock from ever making any progress.

edit

based on your revised pastebin, I would suggest something like:

bool done = false;
while (!done) {
    if ((done = (imageAtomicExchange(img2D_0,coord,1u)==0))) {
        // guarded operations
                 :
        imageStore(img2D_0, coord, 0);
    }
}

This avoids the warp loop deadlock as the threads left out are those that have already completed their locked modification. If only one thread can acquire its lock, that thread will make progress.

Chris Dodd
  • 119,907
  • 13
  • 134
  • 226
  • How would that be possible? All of the threads within a warp should come from the same primitive, yes? – Nicol Bolas Aug 04 '12 at 22:31
  • Drawing only one random triangle, the problem does *not* occur. Drawing two random triangles, the problem eventually does--I suspect when the triangles overlap. – geometrian Aug 05 '12 at 01:35
  • I rewrote the control flow to take this into account, which makes the problem go away. If no one has a better explanation, I'll accept this answer. The algorithm now works mostly http://pastebin.com/WL2SMyrJ. I added memoryBarrier calls which minimize artifacts, but there are still a few artifacts. I put them after writes, although I seem to recall reading somewhere that they couldn't be in control flow, which I assume was meant to include loops--could that be the issue? – geometrian Aug 05 '12 at 01:51
  • @NicolBolas: I'm pretty sure adjacent primitives from the same draw command can end up in the same warp. Particularly when you have small triangles (only a couple of pixels), not combining the fragments into a warp would be quite inefficient. – Chris Dodd Aug 05 '12 at 16:52