I had this idea for something "intrinsic-like" on OpenGL, but googeling around brought no results.
So basically I have a Compute Shader for calculating the Mandelbrot set (each thread does one pixel). Part of my main-function in GLSL looks like this:
float XR, XI, XR2, XI2, CR, CI;
uint i;
CR = float(minX + gl_GlobalInvocationID.x * (maxX - minX) / ResX);
CI = float(minY + gl_GlobalInvocationID.y * (maxY - minY) / ResY);
XR = 0;
XI = 0;
for (i = 0; i < MaxIter; i++)
{
XR2 = XR * XR;
XI2 = XI * XI;
XI = 2 * XR * XI + CI;
XR = XR2 - XI2 + CR;
if ((XR * XR + XI * XI) > 4.0)
{
break;
}
}
So my thought was using vec4
's instead of floats
and so doing 4 calculations/pixels at once and hopefully get a 4x speed-boost (analog to "real" CPU-intrinsics). But my code seems to run MUCH slower than the float
-version. There are still some mistakes in there (if anyone would still like to see the code, please say so), but I don't think they are what slows down the code. Before I try around for ages, can anybody tell me right away, if this endeavour is futile?