0

I have the following fragment shader written in both GLSL & HLSL (here written in HLSL, but the implementations are almost identical):

sampler2D input : register(s0);
float3 lowerBounds : register(c0);
float3 higherBounds : register(c1);

float4 main(float2 uv : TEXCOORD) : COLOR
{
    float4 color = tex2D(input, uv);

    float y = clamp(0.299 * color.r + 0.587 * color.g + 0.1140 * color.b, 0.0, 1.0);
    float u = clamp(-0.169 * color.r - 0.331 * color.g + 0.5000 * color.b, 0.0, 1.0);
    float v = clamp(0.500 * color.r - 0.419 * color.g - 0.0813 * color.b, 0.0, 1.0);

    if (((y >= lowerBounds.x && y <= higherBounds.x) && (u >= lowerBounds.y && u <= higherBounds.y)) && (v >= lowerBounds.z && v <= higherBounds.z))
    {
        color = 0;
    }

    return color;
}

As you can see, the shader simply checks if a color fall within two YUV colors, and if it does, the fragment is filtered out.

I understand that conditional statements can be really bad for performance so I'm wondering if the above is an example of a "bad" conditional and/or it can be optimized to not use an if statement.

Edit: The final optimized code looks like so:

sampler2D input : register(s0);
float3 lowerBounds : register(c0);
float3 higherBounds : register(c1);

float4 main(float2 uv : TEXCOORD) : COLOR
{
    float4 color = tex2D(input, uv);

    float y = clamp(0.299 * color.r + 0.587 * color.g + 0.1140 * color.b + 0.0627, 0.0, 1.0);
    float u = clamp(-0.169 * color.r - 0.331 * color.g + 0.5000 * color.b, -0.5, 0.5);
    float v = clamp(0.500 * color.r - 0.419 * color.g - 0.0813 * color.b, -0.5, 0.5);

    float3 yuv = { y, u, v };

    // Calculate and apply mask from background range
    float3 mask = step(lowerBounds, yuv) * step(yuv, higherBounds);
    color *= 1.0 - (mask.x * mask.y * mask.z);

    return color;
}
monoceres
  • 4,722
  • 4
  • 38
  • 63
  • if statements are not automatically bad... shader executes in batches, so if there is only one pixel that fits the condition, every other pixels needs to wait. But it's often better to wait one simple assign instruction like your "color = 0", than doing complex mutliplications or step function calls... – kaiser Jun 18 '19 at 23:15

1 Answers1

1

I think this code should do the trick:

vec3 yuv = vec3(y, u, v);
color = step(lowerBounds, yuv ) * step(yuv, upperBounds) * color;

if yuv is < to lowerBounds it will return 0 same as yuv >= lowerBounds

if upperBounds is < to yuv it will return 0 same as yuv <= upperBounds

Paltoquet
  • 1,184
  • 1
  • 10
  • 18
  • Thanks for your answer! It's not quite correct (the step function needs to multiply together all the results so to apply all channels or none), but I fixed it myself and edited my question with the correct code :) – monoceres Mar 08 '19 at 12:54
  • yep it was the opposite, normally it sucks when you have a lot of branches, yours don't but now you have a vectorized impl. for further look check http://theorangeduck.com/page/avoiding-shader-conditionals – Paltoquet Mar 08 '19 at 15:53
  • Have you checked, if it's really faster? – kaiser Jun 18 '19 at 23:23
  • @kaiser how would you check it? is there a good shader snippet testing tool somewhere? or just try to see if it makes a noticable fps difference? – Kjell Schwaricke Sep 27 '19 at 01:12
  • My guess is that it will become driver dependent alongside the way your card handle branching and stuff. see https://stackoverflow.com/questions/4176247/efficiency-of-branching-in-shaders – Paltoquet Sep 27 '19 at 06:07
  • 1
    @Kjell Visual Studio comes with some nice GPU debugging and profiling tools. But you can also simple check fps, with masses of triangles. Pls keep in mind that fps means frames per second, so it's not linear. Better calculate execution time of a frame in ms. – kaiser Sep 27 '19 at 07:10