How to pack normals into GL_INT_2_10_10_10_REV

Question

In my pet project video memory started to become an issue, therefore I had a look at various techniques to minimize the memory footprint. I tried using GL_INT_2_10_10_10_REV, but I get lighting artifacts using my packing method. These artifacts seem to be not a result of inaccuracies, because using a normalized char[3] or short[3] works flawlessly. Because of otherwise useless padding I would prefer to use the more space efficient GL_INT_2_10_10_10_REV.

This is the packing code:

union Vec3IntPacked {
    int i32;
    struct {
        int a:2;
        int z:10;
        int y:10;
        int x:10;
    } i32f3;
};

int vec3_to_i32f3(const Vec3* v) {
    Vec3IntPacked packed;
    packed.i32f3.x = to_int(clamp(v->x, -1.0f, 1.0f) * 511);
    packed.i32f3.y = to_int(clamp(v->y, -1.0f, 1.0f) * 511);
    packed.i32f3.z = to_int(clamp(v->z, -1.0f, 1.0f) * 511);
    return packed.i32;
} // NOTE: to_int is a static_cast

If I am reading the spec correctly (section 10.3.8, "Packed Vertex Data Formats" and conversion rules in 2.1 and 2.2), this should work, but well it doesn't.

I should also note, that the above code was tested on multiple OSs (all 64bit though, but int should still be 32 bit nethertheless) and graphics card vendors to check whether it was a driver related issue.

Furthermore the OpenGL 3.3 core profile is used.

The vertex structure is composed as following:

struct BasicVertex {
     float position[3];
     unsigned short uv[2];
     int normal;
     int tangent;
     int bitangent;
} // resulting in a 4-byte aligned 28 byte structure

Hopefully I provided sufficient information and someone can shed some light on how to properly pack normals into GL_INT_2_10_10_10_REV.

Is per-vertex memory really going to put a dent in VRAM problems? Usually it's more effective to tackle the data formats used by render targets, MSAA, texture compression, etc. — Andon M. Coleman, Mar 12 '16 at 22:15
I should have been more clear, but apart from raw memory requirement, I am streaming a lot of data to the GPU minimising this traffic was the ultimate goal. Sorry for leaving this out, I figured my exact motive was irrelevant to the question. — niktehpui, Mar 14 '16 at 10:11
It is not, I just wanted to make certain you knew that optimizing vertex buffer efficiency is not the most effective way to minimize bandwidth or storage requirements. You are sort of chipping away at the tip of the iceberg when there's a kilometer of ice belonging to pixel operations below the surface. — Andon M. Coleman, Mar 14 '16 at 10:44

score 8 · Accepted Answer · edited May 23 '17 at 12:15

The order in your bitfield declaration looks incorrect. Based on the spec document (section "2.8.2 Packed Vertex Data Formats" on page 32 of the 3.3 spec), the bit range for each component is:

x: bits 0-9
y: bits 10-19
z: bits 20-29
w: bits 30-31

After some searching, it looks like the order of bits in a bitfield is not defined by the C standard. See e.g. Which end of a bit field is the most significant bit?

The compilers I have seen typically use a lowest to highest bit order. For example, Microsoft defines this for their compiler:

Bit fields are allocated within an integer from least-significant to most-significant bit.

If you rely on using a compiler with this order, your declaration should look like this:

union Vec3IntPacked {
    int i32;
    struct {
        int x:10;
        int y:10;
        int z:10;
        int w:2;
    } i32f3;
};

For guaranteed full portability, you would use shift operators to build the values, and not use a bitfield at all.

Depending on how you declare and use the attribute in your vertex shader, you may also want to make sure that you set the w component to 1. Of course if you don't use the w component in the vertex shader, that will not be necessary.

Bim · Answer 2 · 2019-01-30T18:56:41.840

I'm just leaving this here, because I had a hard time getting this to work and there is no full-scale answer on StackOverflow. Reto Koradi is correct about the byte/bit ordering (the OpenGL wiki also shows the layout) and using shifts, but you still need to get there correctly... The example code (and other questions here on StackOverflow) seem to rely on undefined behaviour and it didn't work for me. What is working for me (for OpenGL <= 4.1) is

inline uint32_t Pack_INT_2_10_10_10_REV(float x, float y, float z, float w)
{
    const uint32_t xs = x < 0;
    const uint32_t ys = y < 0;
    const uint32_t zs = z < 0;
    const uint32_t ws = w < 0;
    uint32_t vi =
        ws << 31 | ((uint32_t)(w + (ws << 1)) & 1) << 30 |
        zs << 29 | ((uint32_t)(z * 511 + (zs << 9)) & 511) << 20 |
        ys << 19 | ((uint32_t)(y * 511 + (ys << 9)) & 511) << 10 |
        xs << 9  | ((uint32_t)(x * 511 + (xs << 9)) & 511);
    return vi;
}

which I found here. For normals just leave out the "w" part. If you have a faster/easier method, I'd love to know it. For setting up the attribute pointer, make sure you use

glVertexAttribPointer(1, 4, GL_INT_2_10_10_10_REV, GL_TRUE, stride, dataPointer);

and your normal data arrives in your shader as a vec4, mapped to [-1,1]. You can also conveniently use a vec3 if you don't need the w-component and OpenGL will just give you xyz, so probably you don't need to change any shader code at all. Some answers state you must use "glVertexAttribIPointer", but this wrong.
Note that as of OpenGL 4.2 the mapping from float to the packed format was changed, so the conversion is different, but easier. This is vertex format also supported in OpenGL ES 3.0 and above.

How to pack normals into GL_INT_2_10_10_10_REV

2 Answers2