OpenGL 4 and ES 3.0 discrepancy with packHalf2x16 / unpackHalf2x16

Question

I would very much have wished to ask a succinct question that allows a clear answer, but I fear there are too many minor things I don't fully understand regarding FBO initialization that I need to clear up. I'm writing a deferred shader targeting both OpenGL 4.3 and OpenGL ES 3.0, with the former behaving exactly as I'd expect, but the latter giving me issues I fail to identify the source of.

First, I'll describe my understanding/confusions regarding setting up MRT FBOs for GL 4.2 and ES 3.0, and hope someone is kind enough to correct any misconceptions.

OpenGL ES 3.0 specs says it has support for "four or more rendering targets", but has no mention (that I could find) of the specifications of these render targets. What is safe to assume about the sizes of these render targets? Can I simply assume it can have the internal format of RGBA32F (four 32bit float channels)? It seems to me that this is a crucial assumption/knowledge for shaders writing to the RTs. Is the common procedure: Attempt to create a FBO with certain specifications, followed by a test for FBO completeness? If failed: Reduce requirements and use alternate shaders that compensate for the reduced bit depth?
Precision qualifiers are said to "aid code portability with OpenGL ES, and has no effect with regular OpenGL", but I find it difficult to understand what exactly these highp, mediump, lowp, are used for, and how they play together with the bit depth of the render targets. Firstly, I assume that the bit depth of render targets is determined and configured in the FBO, and that the precision qualifier automatically matches this, which makes me think that high, medium and low has some kind of relation to 32, 16, 8 bits of depth. I have looked over the OpenGL ES 3.0 specs, and it isn't all that clear about this.
A texture attachment for the FBO is configured using glTexStorage2D (with target=GL_TEXTURE_2D, levels=1), which I assume is more correct to use here than glTexImage2D, as only the internalformat should matter.
The configured texture from (3.) is then attached to the FBOs COLOR_ATTACHMENT using glFramebufferTexture2D.

Where it gets weird (`packHalf2x16`/`unpackHalf2x16`):

Let's say I set up the FBO with two color attachments, the first (RT1) with internalformat GL_RGBA32UI, the second (RT2) with GL_RGBA32F. Objects are rendered in two passes. The first to the FBOs RTs, and then two a fullscreen quad handled by the default framebuffer.

To simplify, I'll only focus on passing RGB color data between the two stages. I have attempted to do so in three separate ways:

[Works for GL & ES] Using RT2, storing color data regularly as float, reading it as float texture and outputting it to the default framebuffer.
[Works for GL & ES] Using RT1, storing color data converted to uint (in [0,..,255] for each channel), reading it as uint texture, converting it to float [0,1] and outputting it to the default framebuffer.
[Works only for GL] Using RT1, packing color data into one and a half channel, using packHalf2x16. Reading it as a uint texture, and convert it back to float using unpackHalf2x16.

Not sure how relevant/important the details of the code is (I will quickly follow up on any requests). I'm using highp for both float and int. The render targets of the first pass are defined as:

layout (location = 0) out uvec4 fs_rt1;
layout (location = 1) out vec4 fs_rt2;

And in the second pass, accessed as textures:

uniform highp usampler2D RT1;
uniform highp sampler2D RT2;
...

// in main():
uvec4 rt1 = texelFetch(RT1, ivec2(gl_FragCoord.xy), 0);
vec4 rt2 = texelFetch(RT2, ivec2(gl_FragCoord.xy), 0);

Method 1.:

// in first pass:
fs_rt2.rgb = decal.rgb;

// in second pass:
color = vec4(rt2.rgb, 1.0);

Method 2.:

// in first pass:
fs_rt1.rgb = uvec3(decal.xyz * 256.0f);

// in second pass:
color = vec4(vec3(rt1.xyz)/256.0f, 1);

Method 3.:

// in first pass:
fs_rt1.x = packHalf2x16(decal.xy);
fs_rt1.y = packHalf2x16(vec2(decal.z, 0.0f));

// in second pass:
vec2 tmp = unpackHalf2x16(rt1.y);
color = vec4(vec3(unpackHalf2x16(rt1.x), tmp.x), 1);

In methods 1, 2, and 3, the desktop GL output looks like this: Desktop OpenGL 4.3 output

On a Nexus 5, methods 1 and 2 OpenGL ES 3.0 output look like this:

Nexus 5 OpenGL ES 3.0 output

Method 3 on the nexus 5 however looks like this:

Nexus 5 OpenGL ES 3.0 bad output

I cant figure out why the third method fails on OpenGL ES 3.0. Any help or suggestions would be greatly appreciated. I'm not averse to reading documentation, so if you only want to point me in the right direction, that would help too.

Did using a different method of packing work? – ashleysmithgpu Aug 02 '14 at 20:23 — ashleysmithgpu, Aug 02 '14 at 20:23

score 1 · Answer 1 · edited May 23 '17 at 10:33

For the first few questions:

You can query GL_MAX_COLOR_ATTACHMENTS to get the number of colour attachments you can attach to an FBO. This is guaranteed to be > 4 for ES 3.0. This is irrespective of the format of the colour attachment (Be it a renderbuffer or a texture). However there are restrictions on which formats you can render to. Look at the table for glTexStorage2D and specifically at the column "Color renderable". This lets you know what formats you can attach to an FBO. Code does need to test for FBO completeness, but not because of multiple colour attachments. glCheckFramebufferStatus checks that all attachments have the same number of samples and other vendor specific things like depth/stencil attachments.
Precision qualifiers are there in GLES to aid in optimizations. Some arithmetic operations are faster or more efficient to do in low precision when you know you are working with numbers within a certain range, see section 4.5.1 of the GLSL ES spec. Note that these are the minimum precision values and that some vendors give you highp even when you request lowp. These precision qualifiers are only valid in the GLSL shading language. It does not affect the format of the render targets. Some optimizations you might want to do for e.g. are to do operations on colour values in lowp: values from 0.0f to 1.0f. This allows the gpu to use less energy because it is using an ALU that is designed for low precision operations for example PowerVR. I would say that using anything other than highp should be reserved for when your application is running slowly or as an optimization pass. You do not need to worry about it from the start.
There is a difference between using glTexStorage2D and glTexImage2D. When you use glTexStorage2D it gives the driver a hint that this texture format is not going to change. It becomes immutable. This means the driver can perform optimizations when you use this texture. Always prefer glTexStorage2D. :)
Yes it is. Again, check GL_MAX_COLOR_ATTACHMENTS to see how many you can attach to an FBO. You can use GL_COLOR_ATTACHMENT0 to GL_COLOR_ATTACHMENT0 + maxColourAttachments.

As for your problem I cannot see any errors in your code, however I fear there may be an error in the driver of the GPU that you are using. I have run into a similar problem before when using uint variables on certain GPU's. I would suggest you try the application on a phone with a different vendors GPU in to see if you get the same error.

You could also try and avoid the problem by using a different method of packing see this answer. This does not use uints to store the information but a float render target, so may bypass the error.

I hope this answers your question. :)

OpenGL 4 and ES 3.0 discrepancy with packHalf2x16 / unpackHalf2x16

Where it gets weird (packHalf2x16/unpackHalf2x16):

1 Answers1

Where it gets weird (`packHalf2x16`/`unpackHalf2x16`):