2

I read a terrifying post recently where someone claimed that a switch statement in GLSL uses no conditional branching, and in fact causes every possible outcome to be run every time the switch is entered. This is worrying for me because I'm currently working on a deferred rendering engine which uses a few nested switch statements.

Does anyone know if there's any truth to this, and can it be backed up with any evidence?

Thanks!

Das Louis
  • 140
  • 2
  • 10

1 Answers1

4

I read a terrifying post recently where someone claimed that a switch statement in GLSL uses no conditional branching, and in fact causes every possible outcome to be run every time the switch is entered.

Neither of these is necessarily true. At least, not on today's hardware.

What happens is very dependent on the compiler and the underlying hardware architecture. So there is no one answer. But it is also very dependent on one other thing: what the condition actually is.

See, the reason why a compiler would execute both sides of a condition has to do with how GPUs work. GPUs gain their performance by grouping threads together and executing them in lock-step, with each thread group executing the exact same sequence of steps. With a conditional branch, this is impossible. So to do a true branch, you have to break up a group depending on which individual threads execute which branch.

So instead, if the two branches are fairly short, it'll execute them both and discard the particular values from the not-taken branch. The particular discarding of values doesn't require breaking thread groups, due to specialized opcodes and such.

Well, if the condition is based on an expression which is dynamically uniform (ie: an expression which is always the same within a draw call/context), then there is a good chance the compiler will not execute both sides. That it will do a true condition.

The reason being that, because the condition is dynamically uniform, all threads in the group will execute the same code. So there is no need to break a group of threads up to do a proper condition.

So if you have a switch statement that is based on a uniform variable, or expressions only involving uniform and compile-time constant variables, then there is no reason to expect it to execute multiple branches simultaneously.

It should also be noted that even if the expression is not dynamically uniform, the compiler will not always execute both branches. If the branches are too long or too different or whatever, it can choose to break up thread groups. This can lower performance, but potentially not as much as executing both groups. It's really up to the compiler to work out how to do it.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • 1
    The way I understand it, modern hardware goes beyond avoiding to execute both branches for dynamically uniform conditions. As long as all threads in a thread group (or warp, or wavefront, or whatever term vendors use) evaluates the condition to the same value, only one branch is executed. It will only start executing both branches if the condition diverges within a warp/wavefront. This happens dynamically, so it doesn't have to know ahead of time if the condition will diverge or not. – Reto Koradi Dec 10 '15 at 18:18
  • @RetoKoradi: Is that a function of the hardware or a function of the compiler in tandem with the hardware? That is, if the compiler can statically determine that a particular condition will never diverge within a wavefront, will the compiler be able to tell the hardware not to bother thinking about diverging? If so, then it still pays to use non-dynamically uniform branches only when absolutely necessary. – Nicol Bolas Dec 10 '15 at 18:27
  • Must be a hardware feature. I believe there has been a lot of progress in handling branches efficiently. Aside from more complex use cases for graphics, I imagine that compute, where you often have more complex control flow, played a major role in that. It's plausible that execution could be optimized if you know ahead of time what branch will be used. You could even build a shader where one branch is eliminated. But I don't know if any of that is beneficial, and if it is used. – Reto Koradi Dec 11 '15 at 04:48
  • 1
    @RetoKoradi Thanks, both, this is very helpful, and very interesting. In my case, I was using a nested switch statement in a fragment shader for a fullscreen quad, switching on values from a buffer written in previous passes. I found that on newer NVidia cards (on which I was developing) it was super fast, but on ATI/AMD (on which I had to test) cards it was painfully slow - as in, 18 seconds to draw about 50 24x24 sized images slow! I modified the code to use subroutines instead and this seems to have greatly improved things. Cheers! – Das Louis Dec 11 '15 at 15:18
  • @DasLouis: Which AMD hardware? Was it GCN-based? Also, if you could switch to subroutines, that means you could have use switch statements based on uniform expressions all along. So what happens if you did that rather than using values fetched from buffers? – Nicol Bolas Dec 11 '15 at 15:20