4

I have some computations (below) in my fragment shader function which is called a huge number of times. I'd like to know if it is possible to optimize this code. I took a look at the OpenGL.org glsl optimisation page, and made some modifications, but it is possible to make this code faster?

uniform int mn;
highp float Nx;
highp float Ny;   
highp float Nz;
highp float invXTMax;
highp float invYTMax;
int m;    
int n;    

highp vec4 func(in highp vec3 texCoords3D)
{        

    // tile index
    int Ti = int(texCoords3D.z * Nz);

    // (r, c) position of tile withn texture unit
    int r = Ti / n; // integer division
    int c = Ti - r * n;

    // x/y offsets in pixels of tile origin with the texture unit
    highp float xOff = float(c) * Nx;
    highp float yOff = float(r) * Ny;

    // 2D texture coordinates
    highp vec2 texCoords2D;
    texCoords2D.x = (Nx * texCoords3D.x + xOff)*invXTMax;
    texCoords2D.y = (Ny * texCoords3D.y + yOff)*invYTMax;

    return texture2D(uSamplerTex0, texCoords2D); 
}

Edit:

To give some context, func() is used as part of a ray casting setup. It is called up to 300 times from main() for each fragment.

M-V
  • 5,167
  • 7
  • 52
  • 55
  • It doesn't seem to use much computation time to me... – TheAmateurProgrammer Jul 30 '12 at 02:43
  • 2
    Show more context. The optimal solution may require altering the function and its relationship to the caller. – Brian Cain Jul 30 '12 at 03:01
  • func() is called in a loop up to 300 times from the main function. It's part of a ray casting setup. For each fragment on the screen this could be called so many times, and so it does take up a lot of computation time. – M-V Jul 30 '12 at 04:38
  • I doubt it will give huge boost, but you could try running glsl-optimizer on your shader: https://github.com/aras-p/glsl-optimizer – Mārtiņš Možeiko Jul 30 '12 at 05:18
  • The first problem I see is the integer stuff. Don't do that; round instead. As there is no round function in OpenGL ES 2.0's GLSL, you have to roll your own: sign(x) * floor(abs(x) + .5). –  Jul 30 '12 at 14:30
  • Have you tried running this through the PowerVR tuning tools to see what the estimated cycle count is, as well as where the expensive instructions might be? http://stackoverflow.com/a/6051739/19679 The biggest thing that leaps out at me is that you're doing a dependent texture read on every execution of this function. That's horribly expensive on the iOS hardware. – Brad Larson Jul 30 '12 at 14:54
  • I doubt there's a way to avoid it. Dependent texture reads are undesirable, but I use them all the time because using lookup textures is cheaper and more flexible than using math (e.g. 1-pixel high uncompressed LUT for specular instead of pow). –  Jul 30 '12 at 15:07
  • Brad, I have not used PowerVR - will check it out. I don't think I can avoid the texture2D call. – M-V Jul 31 '12 at 01:55
  • Jessy, why is sign(x) * floor(abs(x) + .5) much faster than int(x) ? – M-V Jul 31 '12 at 07:18
  • @M-V I couldn't tell you; I don't design the hardware. I just profile. I assume it's because floating point operations are so much more relatively common on the GPU, so the hardware is tuned specifically for it. –  Jul 31 '12 at 13:42

2 Answers2

2

It is very easy to vectorize the code as follows:

highp vec3 N;
highp vec2 invTMax;

highp vec4 func(in highp vec3 texCoords3D)
{        
    // tile index
    int Ti = int(texCoords3D.z * N.z);

    // (r, c) position of tile within texture unit
    int r = Ti / n;
    int c = Ti - r * n;

    // x/y offsets in pixels of tile origin with the texture unit
    highp vec2 Off = vec2( float(c), float(r) ) * N;

    // 2D texture coordinates
    highp vec2 texCoords2D = ( N * texCoords3D.xy + Off ) * invTMax;

    return texture2D(uSamplerTex0, texCoords2D); 
}

To make sure the similar calculations run in parallel.

Sergey K.
  • 24,894
  • 13
  • 106
  • 174
0

Modifying the texture coordinates instead of using the ones passed into the fragment shader creates a dynamic texture read and the largest performance hit on earlier hardware.

Check the last section on Dynamic Texture Lookups

https://developer.apple.com/library/ios/documentation/3DDrawing/Conceptual/OpenGLES_ProgrammingGuide/BestPracticesforShaders/BestPracticesforShaders.html

They suggest moving the texture coordinates up into the fragment shader. Looks like you can without much issue if I understand the intent of the code correctly. Your adding offset and tile support for fine adjustments, scaling, and animation on your UVs (and thus textures) ? Thought so. Use this.

//
// Vertex Shader
//


attribute vec4 position;
attribute vec2 texture;


uniform mat4 modelViewProjectionMatrix;


// tiling parameters:
// -- x and y components of the Tiling (x,y)
// -- x and y components of the Offset (w,z)
// a value of vec4(1.0, 1.0, 0.0, 0.0) means no adjustment

uniform vec4 texture_ST;  

// UV calculated in the vertex shader, GL will interpolate over the pixels
// and prefetch the texel to avoid dynamic texture read on pre ES 3.0 hw.
// This should be highp in the fragment shader.

varying vec2 uv;


void main()
{
    uv = ((texture.xy * texture_ST.xy) + texture_ST.zw);

    gl_Position = modelViewProjectionMatrix * position;
}