5

For quite some time, I've been avoiding branching in my shader code by, instead of

float invert_value(in float value)
{
if(value == 0.0)
    return 0.0;
else
    return 1.0 / value;
}

writing 'clever' code like this

float invert_value_ifless(in float value)
{
float sign_value = sign(value);
float sign_value_squared = sign_value*sign_value;
return sign_value_squared / ( value + sign_value_squared - 1.0); 
}

This returns exactly what the first function does and has no branches, thus it is faster.

Or is it? Am I fighting with ghosts here?

How to profile graphics shaders for speed? I am most interested in recent mobile platforms (Android) but any advice on graphics profiling in general would be welcome!

Leszek
  • 1,181
  • 1
  • 10
  • 21
  • 1
    "*Am I fighting with ghosts here?*" [Yes, you are.](http://stackoverflow.com/q/37827216/734069) – Nicol Bolas Jun 28 '16 at 14:36
  • Excellent answer there Nicol. – Leszek Jun 28 '16 at 16:11
  • Is this fighing with ghosts as well? Rather than if(a>0) b=(1-a)/(2-a) else b=(1+a)/(2+a) write signA=sign(a); b = (1-signA*a)/(2-signA*a); ? – Leszek Jun 28 '16 at 16:14
  • Unless you have genuine, accurate profiling data *in your hands*, then those are just premature optimizations that make the code harder to read and understand. – Nicol Bolas Jun 28 '16 at 17:04

2 Answers2

2

It often still is for the reason that you probably originally believed — a GPU is often implemented as a very-wide SIMD processor, so performing the same operations for every pixel allows a lot of them to be processed at once whereas picking different operations per pixel makes that calculus a lot more problematic. That's why operations like step survive in GLSL. A good GLSL compiler can usually eliminate compile-time conditionality and may be able to make your branching code non-branching by other means but GLSL compilers aren't generally as good as normal offline language compilers because they have their own performance budget to worry about.

I'm an iOS person professionally so I can talk at length about the wonders of the Xcode frame profiler, and will do so for the benefit of a complete answer, but I apologise that I can't offer much about Android.

In Xcode there's a frame capture button. Hit it and the full OpenGL command flow will be captured for a single frame. From there you'll be able to inspect all state and buffers as they were before and after each OpenGL command. The amount of time each call took will be reported. Better than that, your GLSL code itself will have been profiled down to the line level — µs per line of code will be reported. And, really putting it over the edge, you can live rewrite your GLSL code right there and rerun the frame as captured to find out what happens to your costs. Or just in general as a fast-feedback GLSL authorship environment, though it's not really what the tool is for.

Tommy
  • 99,986
  • 12
  • 185
  • 204
0

All the major GPU manufacturers on Android have their own GPU profiling tools that do roughly the same as XCode's frame capture. ARM, Qualcomm and PowerVR do.

Things like this have to be measured, and unfortunately, due to the problems with Android users not updating for various reasons, the quality of drivers out there in the wild is variable.

Columbo
  • 6,648
  • 4
  • 19
  • 30
  • Yes, I am trying to use the Adreno Profiler ATM, but for some reason it 'cannot find adreno profiler enabled app' when I connect it to my phone.... I'll try some more :) – Leszek Jun 28 '16 at 23:11
  • Yes, my experience with the GPU profiling tools has generally been an exercise in frustration. If you're looking for a shortcut, I know PowerVR's shader editor makes cycle count estimates for various of their GPU architectures without you having to run the shader on device (I expect the others vendors have similar functionality, but I don't know for sure). That might be sufficient to get an idea. – Columbo Jun 29 '16 at 05:42