I have a really complex HLSL shader doing tons of texture reads, using shader model 3 in Direct3D9. The complex code is only used at some pixels so I put an if-statement around that block of code. To my surprise this gives no performance gain at all. If I use clip(-1) instead I do see an enormous performance boost, so this shader is indeed the bottleneck of my program. Why doesn't the branching improve my performance without the clip(-1) line?
I found this topic: How much performance do conditionals and unused samplers/textures add to SM2/3 pixel shaders? This topic states that in shader model 3 it is possible to optimise with branching, but the performance is that of the worst of each batch of pixels. In may case the slow branch is taken mostly at the edges of the screen and the fast branch is mostly at the centre of the screen. I think this means that batches of pixels will generally take the same branch, so I would expect a performance gain this way.
In pseudo-code the pixel shader looks like this:
float4 colour = tex2D(texture, uv);
if (colour.a < 0.5f)
{
//I only get a performance boost if I replace this line with clip(-1);
oColour = colour;
}
else
{
complexSlowCodeWithTonsOfTextureReadsGoesHere;
oColour = result;
}
oColour *= 2;
This gives me the exact same performance as when I remove the branching and always use the code in the slow else-branch. If I replace the fifth line with clip(-1) I see an enormous performance boost (and a mostly black screen) so the if-statement is actually functioning.
Am I doing something wrong here or is it not possible to optimise a shader like this in shader model 3?