I have a vertex shader as such
void main (){
vec4 wPos = modelMatrix * vec4( position , 1. );
vWorldPosition = wPos.xyz;
float mask = step(
0.,
dot(
cameraDir,
normalize(normalMatrix * aNormal)
)
);
gl_PointSize = mask * uPointSize;
gl_Position = projectionMatrix * viewMatrix * wPos;
}
I'm not entirely sure how to test the performance of the shader, and exclude other factors like overdraw. I imagine a point of size 1, arranged in a grid in screen space without any overlap would work?
Otherwise i'm curious about these tweaks:
(removes step
, removes a multiplication, introduces if
else
)
void main (){
if(dot(
cameraDir,
normalize(normalMatrix * aNormal) //remove step
) < 0.) {
gl_Position = vec4(0.,.0,-2.,.1);
gl_PointSize = 0.;
} else {
gl_PointSize = uPointSize; //remove a multiplication
vec4 wPos = modelMatrix * vec4( position , 1. );
vWorldPosition = wPos.xyz;
gl_Position = projectionMatrix * viewMatrix * wPos;
}
}
vs something like this:
void main (){
if(dot(
cameraDir,
normalize(normalMatrix * aNormal)
) < 0.) {
gl_Position = vec4(0.,.0,-2.,.1);
return;
}
gl_PointSize = uPointSize;
vec4 wPos = modelMatrix * vec4( position , 1. );
vWorldPosition = wPos.xyz;
gl_Position = projectionMatrix * viewMatrix * wPos;
}
Will these shaders behave differently and why/how?
I'm interested if there is a something to quantify the difference in performance.
- Is there some value, like number of MADs or something else that the different code would obviously yield?
- Would different generation GPUs treat these differences differently?
- If the step version is guaranteed to be fastest, is there a known list of patterns of how branching can be avoided, and which operations to prefer? (Like using
floor
instead ofstep
could also be possible?):
.
float condition = clamp(floor(myDot + 1.),0.,1.); //is it slower?