I am trying to do a raytrace on a grid in a fragment shader. I have written the shader below to do this (vertex shader just draws a screenquad).
#version 150
uniform mat4 mInvProj, mInvRot;
uniform vec4 vCamPos;
varying vec4 vPosition;
int test(vec3 p)
{
if (p.x > -4.0 && p.x < 4.0
&& p.y > -4.0 && p.y < 4.0
&& ((p.z < -4.0 && p.z > -8.0) || (p.z > 4.0 && p.z < 8.0)))
return 1;
return 0;
}
void main(void) {
vec4 cOut = vec4(0, 0, 0, 0);
vec4 vWorldSpace = mInvRot * mInvProj * vPosition;
vec3 vRayOrg = vCamPos.xyz;
vec3 vRayDir = normalize(vWorldSpace.xyz);
// http://en.wikipedia.org/wiki/Xiaolin_Wu%27s_line_algorithm
vec3 adelta = abs(vRayDir);
int increaser;
vec3 gradient, sgradient;
if (adelta.x > adelta.y && adelta.x > adelta.z)
{
increaser = 0;
gradient = vec3(vRayDir.x > 0.0? 1.0: -1.0, vRayDir.y / vRayDir.x, vRayDir.z / vRayDir.x);
sgradient = vec3(0.0, gradient.y > 0.0? 1.0: -1.0, gradient.z > 0.0? 1.0: -1.0);
}
else if (adelta.y > adelta.x && adelta.y > adelta.z)
{
increaser = 1;
gradient = vec3(vRayDir.x / vRayDir.y, vRayDir.y > 0.0? 1.0: -1.0, vRayDir.z / vRayDir.y);
sgradient = vec3(gradient.x > 0.0? 1.0: -1.0, 0.0, gradient.z > 0.0? 1.0: -1.0);
}
else
{
increaser = 2;
gradient = vec3(vRayDir.x / vRayDir.z, vRayDir.y / vRayDir.z, vRayDir.z > 0.0? 1.0: -1.0);
sgradient = vec3(gradient.x > 0.0? 1.0: -1.0, gradient.y > 0.0? 1.0: -1.0, 0.0);
}
vec3 walk = vRayOrg;
for (int i = 0; i < 64; ++i)
{
vec3 fwalk = floor(walk);
if (test(fwalk) > 0)
{
vec3 c = abs(fwalk) / 4.0;
cOut = vec4(c, 1.0);
break;
}
vec3 nextwalk = walk + gradient;
vec3 fnextwalk = floor(nextwalk);
bool xChanged = fnextwalk.x != fwalk.x;
bool yChanged = fnextwalk.y != fwalk.y;
bool zChanged = fnextwalk.z != fwalk.z;
if (increaser == 0)
{
if ((yChanged && test(fwalk + vec3(0.0, sgradient.y, 0.0)) > 0)
|| (zChanged && test(fwalk + vec3(0.0, 0.0, sgradient.z)) > 0)
|| (yChanged && zChanged && test(fwalk + vec3(0.0, sgradient.y, sgradient.z)) > 0))
{
vec3 c = abs(fwalk) / 4.0;
cOut = vec4(c, 1.0);
break;
}
}
else if (increaser == 1)
{
if ((xChanged && test(fwalk + vec3(sgradient.x, 0.0, 0.0)) > 0)
|| (zChanged && test(fwalk + vec3(0.0, 0.0, sgradient.z)) > 0)
|| (xChanged && zChanged && test(fwalk + vec3(sgradient.x, 0.0, sgradient.z)) > 0))
{
vec3 c = abs(fwalk) / 4.0;
cOut = vec4(c, 1.0);
break;
}
}
else
{
if ((xChanged && test(fwalk + vec3(sgradient.x, 0.0, 0.0)) > 0)
|| (yChanged && test(fwalk + vec3(0.0, sgradient.y, 0.0)) > 0)
|| (xChanged && yChanged && test(fwalk + vec3(sgradient.x, sgradient.y, 0.0)) > 0))
{
vec3 c = abs(fwalk) / 4.0;
cOut = vec4(c, 1.0);
break;
}
}
walk = nextwalk;
}
gl_FragColor = cOut;
}
As long as I am looking at close grid items, the hardcoded ones, the framerate looks acceptable (400+fps on a Geforce 680M) (although lower than I would expect comparing to other shaders I have written so far), but when I look at emptyness (so the loop goes all the way up to 64), the framerate is terrible (40fps). I get around 1200 fps when looking so close at a grid that every pixel ends up in the same close grid item.
Although I understand that doing this loop for every pixel is some work, it still is some easy basic math, especially now that I have removed the texture-lookup and have just used a simple test, so I don't understand why this has to slow everything down so hard. My GPU has 16 cores and runs at 700+Mhz. I am rendering at 960x540, 518400 pixels. It should be able to handle much more than this I would think.
If I remove the antialiasing part of the above (the part of code where I will test some extra adjacent points based on the increaser value), it is a little better (100fps), but come on, with these calculations, it shouldn't make much difference! If I split the code so that increaser is not used but the below code is done for every different part, the framerate stays the same. If I change some ints to floats, nothing changes.
I have done much more intensive and/or complicated shaders before, so why is this one so terribly slow? Can anyone tell what calculation I do makes it go so slow?
I am not setting uniforms that are not used or something like that, the C-code is also doing nothing more than just rendering. It is code I have used successfully 100s of times before.
Anyone?